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THEORY OF LEARNING WITH CONSTANT, VARIABLE, OR 
CONTINGENT PROBABILITIES OF REINFORCEMENT* 


W. K. Estes 


INDIANA UNIVERSITY 


The methods used in recent probabilistic learning models to generate 
mean curves of learning under random reinforcement are extended to the 
general case in which probability of reinforcement may vary in any specified 
manner as a function of trials and to cases in which probability of reinforce- 
ment on a given trial is contingent upon responses or outcomes of preceding 
trials. 


Our purpose is to develop a general model for mean curves of learning 
under random reinforcement in “determinate” situations. By ‘‘determinate’”’ 
we signify the following restrictions. In these situations the subject is con- 
fronted with the same stimulating situation, e.g., a ready signal, at the 
beginning of each trial. The subject responds with one of a specified set of 
alternative responses, (A; , A, , :-- , A,), and following his response is 
presented with one of a specified set of reinforcing events, (E, , E,, --- , E,), 
exactly one reinforcing event E; corresponding to each possible response 
A, . Ina T-maze experiment (with correction procedure), A, and A, correspond 
to left and right turns; E, and E, correspond to “food obtained on left’’ 
and ‘‘food obtained on right’’, respectively. In a simple prediction experiment 
with human subjects [8, 8, 9, 10, 11, 13], the responses (A, , A, , --- , A,) 
correspond to the subject’s predictions as to which of a set of “reinforcing 
lights” (E, , E, , --- , E,) will appear on each trial; instructions are such 
that the subject interprets the appearance of E; to mean that response A; 
was correct. It is further assumed that one can specify in advance of any 
trial the probability that any given response will be followed by any given 
reinforcing event. 

From the set-theoretical model of Estes and Burke [4, 6] plus an assump- 
tion of association of contiguity, it is possible (see [1, 8]) to derive the following 
quantitative law describing the change in the probability of response A; 
on any trial: 

If E; occurs on trial n 


(la) Djinti = (1 — Opin + 8. 


*This paper was prepared while the writer was in residence at the Center for Advanced 
Study in the Behavioral Sciences, Stanford, California. The research on which it is based 
was supported by a faculty research grant from the Social Science Research Council. 
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If E, (ke ~ j) occurs on trial n 


(1b) Dinsi = (1 — Op; .n - 


The quantity p;,, represents the probability of response A; on trial n, and 
6 is a parameter satisfying the restriction 0 S @ < 1. The parameter 6 may 
vary in value from one organism to another, and for a given organism from 
one situation to another, but is assumed to remain constant during any given 
experiment. Functional equations of the form (la) and (1b) may also be 
obtained from the stochastic learning model of Bush and Mosteller [2] by 
imposing suitable restrictions on the parameters. 

Now if we can specify the probabilities with which each of the events 
[E,;] will occur on each trial of a learning experiment, then, given the initial 
probability of A; , it becomes a purely mathematical problem to deduce 
the expected value of p,,, on any trial and thus to generate a predicted 
learning curve which can be compared with experimental curves. For two 
special cases, the mathematical problem has already been solved and the 
desired theoretical curves have been computed and fitted to data [1, 2, 8, 
13]. In the first of these, which we shall call the simple non-contingent case, 
the probability of E; , hereafter designated 7; , has the same value on all 
trials of the series regardless of the subject’s response. In the second of these, 
which we shall call the simple contingent case, the probability of E; on any 
trial depends upon which response is made by the subject. Thus if the subject 
makes response A, , the probability of E; is 7; ; if the subject makes response 
A, , the probability of E; is z.; ; and so on; but the values of 7;, remain 
fixed throughout the series of trials. Now we wish to obtain a more general 
solution which will yield predicted curves for experiments in which the 
constancy requirement is removed and the z; are permitted to vary over 
a series of trials. 


General Solution and Asymptotic Matching Theorem 


Let 7;,, represent the probability that reinforcing event E; will occur 
on trial n, with >>; 7;,, = 1 for all n. Then given that a subject’s probability 
of making response A; on trial n is p;,, , the expected, or mean, value of 
the probability* on trial n + 1 must be 


*Throughout the paper, the quantity p; should be interpreted as follows. (a) In 
equations dealing with learning on a particular trial, e.g., (la) and (1b), p; n+: represents 
the new probability on trial n + 1 for a subject who had the value p;,, on trial n. (6) In 
equations dealing with the expected change on a trial, e.g., (2), (2a), Pj .n+1 represents the 
expected value of p; on trialn + 1, where the average is taken over all possible values of 
p;., and all possible outcomes of trial n; the term “all possible” is defined for any given 
situation by the initial values of p; and the possible sequences of responses and reinforcing 
events over the first n trials. (c) In solutions giving p; as a function of n, e.g., (3), (3a), 
Pj.n is the expected value p; on trial n, where the average is taken over all initial values 
of p; and all possible sequences of responses and reinforcing events over the first n — 1 


trials. 
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(2) Di nti ad (1 i 0)D;.n + Or; on ° 


To obtain (2) average the right hand sides of (1a) and (1b), weighting them 
by the probabilities z,;,, and [1 — 7;,,], respectively, that E; will and will 
not occur. 

Some general asymptotic properties of the model can be clearly displayed 
if we consider, not simply p;,, , the probability of a response on a particular 
trial, but the expected proportion of response occurrences over a series of 
trials. The latter quantity, which we shall designate j;(n), must of course 
satisfy the relation 


D Pie : 


e 1 
p(n) = n& 


Substituting into the right side of this expression from (2), we obtain 


Mo oo - [1 = 6)p;.. + or, 


v=1 


p;(n) 


where 7; (n — 1) represents the expected proportion of E; reinforcing events 
over the first n — 1 trials. For large n, the right side of the last expression 
approaches the limit 


(1 — p,m — 1) + O#,(n — 1). 
Further, since j;(n — 1) always differs from j,;(n) by a term of the order of 
1/n, we can write, for sufficiently large n, the approximate equality 
p(n) = (1 - 0)p(n) + b7,(n — 1), 
or 
Din) = z,(n — 1). 


Thus we find that no matter how 7; varies over a series of trials, the cumulative 
proportions of A; and E; occurrences tend to equality as n becomes large. 
It can be expected that this remarkably general ‘‘matching law’’ will play a 
central role in empirical tests of the theory. 

To study the pre-asymptotic course of learning, we proceed as follows. 
Suppose that a subject begins an experiment with the probability p;,, of 
making an A; ; then his expected probability on trial 2 will be, applying (2), 


Dio = (1 — O)pj,1 + Om; | 
on trial 3, 
Dis = (1 — O)pj,2 + Om;,2 ; 
= (1 — 6)*p;.. + OL — O)mj.. + Om; ,2 
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and, in general, on trial n 
(3) DP; > (1 nia 6)"""D; 1 a te aia 6)" °ar; 1 oe (1 — 0)" *@;. a om 
+ (1 eg i Pe | 


n-1 
= (1 — 6)""'p,. + 6 > (1 — 0)" 'x;,, . 
v=1 


A number of important features that will characterize the mean learning 
curve regardless of the nature of the function z;,, can be ascertained by 
inspection of (2) and (3). If the value of @ is zero, no learning will occur; 
in the remainder of the paper this case will be excluded from all derivations. 
If the value of 6 is greater than zero then learning will occur. By rewriting 
(2) in the form 


Pi nti == Pin + O(a; .n sia Pj .n)) 


we see that on the average, response probability on any trial changes in the 
direction of the current value of 7; . As n becomes large, the term 
(1 — 6)""p;., in (3) tends to zero. After n is large enough so that 
(1 — 6)""'p;,, is negligible, p;,, is essentially a weighted mean of the 7; 
values which obtained on preceding trials, with z;,,-, having most weight, 
7™;,.-2 less weight, and so on. If z;,, is some orderly function of n, as for 
example a straight line or a growth function, then the curve for p,;,,, tends 
to approach this function as n increases, but always “follows it with a lag.”’ 
If rate of learning is maximal, i.e., 6 is equal to one, then p,,,, is simply equal 
to 7;,,-1 throughout the series of trials; the more @ deviates from one, the 
more the curve for p,,, lags behind that for 7z;,, . 

We may gain further insight into this learning process and at the same 
time develop functions that will be useful in experimental applications by 
considering some special cases in which 7,,,, can be represented by familiar 
functions with simple properties. 


Non-Contingent Case 


a. The special case of x;,,, constant 
If z; is constant, then as one might expect, (2) and (3) reduce to the 
simple expressions 


(2a) Di nvr = (1 — O)p;.n + On; , 
and 
(3a) Pin = Tj Sg (x; iat Pi ed . 


derived by Estes and Straughan [8] from the set-theoretical model [4, 6] 
and, with slightly different notation, by Bush and Mosteller [2] from their 
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“linear operator” model. In this case the predicted learning curve is given 
by a negatively accelerated function tending to 7; asymptotically. Experi- 
mental applications of (3a) are described in references [2, 3, 5, 6, 13]. 


b. The special case of 1;,, linear 


We shall treat this case in some detail since it has a number of properties 
that will be especially convenient for experimental tests of the theory. The 
linear function 

Tin = a; + bn, 


a; and b; being constants, is not in general bounded between zero and one 
for all n; for experimental purposes, however, one need only choose values 
of a; and b; which, for the number of trials to be given, keep the value of 
m;,, Within the required range. Subject to this restriction, we may substitute 
into (2) and (3) to obtain the expected response probability on any trial, 


(2b) Dj .n+1 a (1 Pere BYP; + (a; + bn), 


and 


(3b) Din = a; + Dn — te a (a, +h “. es p,.s)(1 ral 6)"*. 
In the interest of brevity we have omitted the detailed steps involved in 
summing the series in (3); the method of performing the summation in this 
case, and in others to be considered in following sections, is given in standard 
sources [12, 14]. The reader can verify that (3b) is the correct solution to 
(2b) by substituting the former into the latter. The main properties of (3b) 
are illustrated in Fig. 1. Regardless of the initial value p;,, , after a sufficiently 
large number of trials the curve for p;,,, approaches a straight line, 


Pin = A; — “. + bn, 


which has the same slope as the straight line representing 7;,, . If the initial 
value of p;,, is greater than z;,, and the slope of 7z;,, is positive, p;,, will 
decrease until its curve crosses the line z;,, , following which it will increase; 
if b; is small, the point of crossing will be approximately at the minimum 
value of p;,, . To prove the last statement, we replace n by a continuous 
variable ¢, then set the derivative of p;,, with respect to ¢ equal to zero and 
find that p;,, has as its minimum value 
b; b; 

Rie a. > 0 + bitm Od log (1 a 6) ? 
where 
_ log b; — log log (1 — @)* 
7 log (1 — 6) 





bm 
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Curves describing changes in response probability when probability of reinforcement 
varies linearly with trials. The parameter @ has been taken equal to .05. 











and 
1 — 6)b; 
io a, = eens 
a G— 6 
Subtracting z;,,,, from the minimum value of p, , we find that the difference 
is equal to 
b b od] 1 
——i — —_“i___ = 5,,| -2 + —s 
@ log(l— 6) "| 0 a 
6+ 5) + 3 + 
cre res 
he ae Oe 
= —b; 2 ’ 
1 of U ae o 4. cee 
2° 3 J 


which is negative and does not exceed b; in absolute value for any value of 6. 
To obtain an expression for R;(n), the cumulative number of A; responses 
expected in 7 trials, we need only sum (3b): 


(4) Rn) = Dp; 


of E e bi Ny ‘ wnt 1) 





—la+ -%—p,| ll ee 
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Similarly, by summing (3b) over the mth block of k trials and dividing by 
k, we obtain the expected proportion of A; responses in the block: 


ilk, m) = a, — + 2 mk — k +1) 


(5) m 
a, += 7 . psa] 

gt J Nd _— pk _ p\kim-1) 

ko t-  = Fes . 

Equation (5), despite its cumbersome appearance, has essentially the same 

properties as (3b) and can readily be fitted to experimental data. For a 

block of k trials beginning with a value of n large enough so that (1 — @)*“"~” 
is near zero, we have the approximation 





By substituting the observed value of p,;(k, m) from a set of experimental 
data and solving for @, we obtain an estimate of this parameter which, although 
not unbiased, will be adequate for many experimental purposes. 


c. The special case 7;,, = a; + ¢;b} 

Among the possible monotone relations between z; and n, the second 
main type of interest is that in which 7;,, approaches an asymptote. This 
type will be represented by the function z;,,, = a; + c,b} , the values of the 
constants a; , b; , and c; being so restricted that 7;,, is properly bounded 
between zero and one for all n. 

Equations (2) and (3) now take the forms 


(2c) Di n+1 = ql Nay, 6) Dj. + (a; + c,b5); 
and, if b; ¥ 1 — @, 


6c; b; n— 
Pin = 0 +5 5 («, a ee Oa ag p,.s)(l si lie 


or, if b; => ] — 6, 
(3¢) Pin = a; +¢,0(n — 1)(01 — 6)" ~~ ao — ar. 


Some properties of (3c) are illustrated in Fig. 2. In the upper panel, a; has 
been taken equal to .50, c; to 1.0, andb; to.98 so that 7z;,, describes a 
negatively accelerated decreasing curve approaching .50 asymptotically. 
The effect of changing the sign of b; from positive to negative can be seen 
by comparing the lower panel of Fig. 2, which has b; = —.98, a; = .50, and 
c; = 1.0, with the upper panel. Now the values of 7; oscillate from trial to 
trial between a pair of curves, the upper envelope being identical with the 
™;,, curve in the upper panel and the lower envelope curve the mirror image of 
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Figure 2 
Curves describing changes in response probability when probability of reinforcement 
varies exponentially with trials. The parameter 6 has been taken equal to .05. 


it. The values of p,,,, describe a damped oscillation around an exponential 
function; for any given set of parameter values, the values of p;,, will be 
alternately above and below those of the curve 


@c;b, 


Di. = a; — (a, a ee p,.s)( =—_, 





with the deviation from the smooth curve decreasing progressively in magni- 
tude toward zero as n increases. . 

A formula for the expected number of A; responses in n trials can be 
obtained and utilized for estimation of @ as in case (c). 


d. A periodic case 

From an analysis of the general solution in section (a) above, we can 
predict that if x; varies in accordance with a periodic function, then asymp- 
totically the curve for p,,, will be described by a periodic function having 
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the same period. A simple case with convenient properties for experimental 
purposes is the following: 7; is constant within any one block of k trials, but 
alternates between two values, say a; + b; and a; — b; , on successive blocks 
so that the value of z; on each trial of the mth block is given by 


w; = a; + 6,(—1)”. 
The value of p; at the end of the mth block can be taken directly from section 
(a) above: 
Diane = @; + 0-1)” — [a; + 6(—1)* — Dy tm-vaeil(l — 6)*. 
Treating blocks of k trials as units, this expression may be viewed as a 
difference equation of the same form as (2). Substituting a; + 6;(—1)”, 


(1 — 6)", and mk for the corresponding terms 7;,, , (1 — 6), and n of (2) and 
(3), we obtain the solution 








GA Pan =a +0(-1° bt 
fl — (1 — 0)‘] wn 
oe {a + b; (1 + al - 6)*] am pis} ate ) ’ 


Equation (3d) gives us the expected value of p; at the end of the mth trial 
block. Using (3a) of section (a) again, we have for the expected value of 
p; on the n’th trial of the (m + 1)st block 


(Ze) Dj mesns = O; + 0,(—1)"*" — [a; + b(—-1)"" — Dj mevil(l — 0)". 


Properties of this solution are illustrated in Fig. 3. It can be seen that regard- 
less of its initial value, p; settles down to a periodic function with period k. 
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Curves describing changes in response probability when probability of reinforcement 
varies periodically with trials. The parameter @ has been taken equal to .05. 








122 PSYCHOMETRIKA 


e. Outcome contingencies 


Many cases in which the probability of a given reinforcing event on any 
trial depends on the outcome (reinforcing event) of some preceding trial 
can be reduced to cases already considered. Suppose, for example, that we 
set the probability of E, on any trial equal to z,, if an E, occurred on the vth 
preceding trial and to 7,., if an E, occurred on the vth preceding trial. Then 
we can write the following difference equation for z,,, , the expected prob- 
ability of E, on trial n, 


Tin = TM n-0F11 + (1 ars Ti n-») M21 


II 


(m1, i T21)T1 n—» + m2 ’ 
which has the general solution 
meaemternt+Crn+-:--> +Cr. 


The C; are constants to be evaluated from the initial conditions of the ex- 
periment; the r; are roots of the characteristic equation 


rn —mm,+ t= 0; 


and x, , the asymptotic value of 7,,,, is given by 


ene ene 
1 — my, + t21 
If v = 1, i.e., the probability of a given outcome depends on the outcome of 


the preceding trial, the formula for 7z,.,, reduces to 
n—-1 
Hin hs (x, 7 1 )(ty Ka 721) : 
Once a formula for z;,,, has been deduced, it may be substituted into (2), 
and the machinery already developed for non-contingent cases with varying 
probabilities of reinforcement can be applied to generate predictions about 


the course of learning. In the case v = 1, the difference equation for p,., 
and its solution will be given by (2c) and (8c), respectively, with a = 7, 
and b = z,, — 7m, ; this case has been discussed in some detail by Bush and 


Mosteller [2]. 

It should be emphasized that functions derived from the present model 
for outcome contingencies with v = 1 will generally provide satisfactory 
descriptions of empirical relationships only if the experiments are conducted 
with well-spaced trials. According to this model, the asymptotic conditional 
probabilities of A, on trials following E, and E, occurrences, respectively, 
are given by 

is Bi + Hl — 7) 
and 
Poa = ™ — Om. 
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When trials are adequately spaced, these relations may prove to be empirically 
confirmable, but if intertrial intervals are small enough so that the subject 
can form a discrimination based on the differential stimulus after-effects 
of E, and E, trials, then the asymptotic conditional probabilities will certainly 
approach z,, and z,, . A model for the massed-trial case can be derived from 
a set-theoretical model for discrimination learning [1, 7]. Although a detailed 
presentation of the discrimination model would be beyond the scope of this 
paper, it is interesting to note that the discrimination model yields the same 
asymptotic value for the over all mean value of p, as the present model, 
but yields asymptotic means for p,, and p,, which differ from 7,, and 7»; , 
respectively, only by terms which are smaller than @. 


Contingent Case 


Let 7;;,, represent the probability that reinforcing event E; will occur 
on trial n of a series given that the subject makes response A; on this trial, 
and assume that as ™:;,. = 1 for all ¢ and n. Then to obtain the expected 
value of p;,n+1 a8 a function of the value on trial n, we again average the 
right-hand sides of (la) and (1b), weighting each of the possible outcomes 
by its probability of occurrence, viz., 


(6) Dini = (1 = 9)D;.n + 6 De PinTiin ss 


a. General solution for the case of two response classes 


If there are only two response classes, A, and A, , with corresponding 
reinforcing events, E, and E, , defined for a given situation, then we have for 
the expected probability of A, on the second trial of a series, 


is (1 — 0)p1 1 + O[pi 11,1 + (1 Fy Pi,1)%o21,1] 
aa (1 — 60+ Om11.1 — O92) .1)P1,1 + 0911 5 
on the third trial 


(1 se 0)pi,2 + O[P, 2% 11,2 + (1 =. Pi ,2)%e1.2] 
= 20)P1,1 + 20% 21,1 + O121,2 , 


P1,3 


when we have introduced the abbreviation 


a, = 1 — 0+ Om,.. — On. - 
In general on the nth trial, 
n-1 r 
21,u 
Prin = D1 ,101Qq *** Op-1 + Baya, +++ Oy-1 yD iis ia 


(7) 
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Since each of the a, is a fraction between zero and one, we can see by 
inspection of (7) that p,,,, becomes independent of its initial value, p,,, , as 
n becomes large; on later trials it is essentially equal to a weighted mean of 
the z., values which obtained on preceding trials, with z.,,,-,; having most 
weight, 72;,,-2 less weight, and so on. [If 7,, = 1 and z., = 0, then a = 1 
and (6) reduces to 

Piwnt+i = Piwn » 


i.e., on the average no learning occurs. In all derivations presented, we shall 
assume this case to be excluded.] The smaller the average difference between 
™;,, and 7;,, , the more completely is the value of p,,, determined by the 
m;; values of a few immediately preceding trials. As in the non-contingent 
case, the dependence of p,,, on the sequence of 7;; values, might be described 
as “tracking with a lag,’”’ but in this instance it will be necessary to study 
some special cases in order to see just what is being “tracked.” For 
convenience in exposition we shall limit ourselves to situations involving 
two response classes while describing the special cases. In a later section we 
shall indicate how all of the results can be extended to situations involving 
more than two response classes. 


b. The special case of m;; constant 
If z,,,, and z.;,, are both constant, then (6) and (7) reduce to the 


expressions 


2 
(6a) Dinsi = (1 — O)p;.. + 8 ee ; 
i=1 
and 
eee. | 
~ snes b= Ti + T21 
(7s 
T21 n-1 
= (2 = ps.s)( — 06+ Om, — Onn)”, 


previously derived by Estes [5] from the set-theoretical model and by Bush 
and Mosteller [2] from their “linear operator’? model. Experimental appli- 
cations of (7a) are described in references [2, 5, 13]. 
c. Special cases leading to linear difference equations with constant coefficients 

Examination of (6) reveals that it will take the form of a linear difference 
equation with constant coefficients whenever 7,,,,, and 7»,,, differ only by a 
constant. Thus, if 

TWii,n = Qy1 + Jn 


and 
To1.n = An + Yn 5 
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where g, is any function that keeps z;;,,, properly bounded for the range of n 
under consideration, then (7) has the form 








n—-1 
Din as Di .0"" he a" z. a Ju 
(7b) ih 
n—-1 
— n-1 Q2) a oe n-1 n-1 Gu 
= Pi ia ee ean a”) + ba Dae 


where a = (1 — 6+ 6a,,; — 6a2,). For experimental purposes, it will usually 
be most convenient to make g, a linear function of n, say g, = bn, in which 
case we can perform the summation in (7b) and obtain a simple closed 
formula for pi,. , VizZ., 











= Az, + bn a b 
Pin 1— Ay, + Qo, a(1 — ay + acy 
(7¢) ‘ ewe ; ; ' 
1 _— ai > Qo a(1 _ Qin os Q>1)° Pi,1 


‘(1 — 6+ bay, — Oaz:)"'. 


The properties of (7c) are very similar to those of (3b), the corresponding 
solution for the non-contingent case. Regardless of the initial value p,., , 
after a sufficiently large number of trials, the curve for p,,, approaches the 
straight line 
Sou Gz, + bn we b a 
saat 1 — a, + ay Ql — én + Qn)” 





Since @ is the only free parameter in the latter expression, its value can be 
estimated by fitting the straight line to data obtained from a block of trials 
relatively late in the learning series. It becomes apparent now, incidentally, 
what it is that the p,,, curve “tracks with a lag.”’ The first term on the right- 
hand side of (7c) is simply m2,,,/(1 — 11,» + 721,n). Thus at any moment, 
the slope of the p,,, curve is such that it would approach 7.,/(1 — 7:; + m2), 
the asymptote of the constant 7;; solution, (7a), if 71;,, and m2,,, were to 
remain constant from that moment on. Since the 7, ; ,,, do not remain constant, 
the subject’s curve tracks the ‘‘moving asymptote”’ with a lag which depends 
inversely on 6. As in the corresponding non-contingent case, the slope of the 
terminal linear portion of the p,,, curve can be predicted in advance of an 
experiment since it depends only on the values of a,; , a2; , and b, which are 
assigned by the experimenter. 


d. Contingent case with more than two response classes 


The results of the preceding section can be extended without difficulty 
to situations involving more than two response classes. If 7;;,, = @:; + Yn 
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for all 7 (¢ = 1, 2, --- , r), then for a situation involving r response classes, 
we obtain by application of (6) the system of r difference equations 

eS Ait (1 — 0+ 6011) ,n + 0021P2,n +--+ 80,;D,. + O9n 


(8) P2,n+1 = 0012): .n + (1 a iss 6 + 0422)Po.n + oe + 00,2); .n + O9n 


Dr .n+i = 60;,Pi.n + 002,P2,n + has + (1 as 0 + 64,,)Dr.n + O9n ’ 


which must be solved simultaneously in order to obtain the desired formulas 
for p;,, . To facilitate the solution, we define an operator E as follows: 


Ep;,,, — Pi nti . 


Then the system (8) can be rewritten in the form: 


(E aid 1 + 6 on 6a:)pi nm 6021P2,n Ce are 00,1), on = 09, 
bos O04 oP n + (E Pen 1 + 6 i= 6422)Po,n i nai aan 60,2); .n = 09, 
— 00; -D1 .n x 042,Po,n eee + (E — 1 4- 6 aes 6a,,)Dr.n a O9n . 


Now the symbol E may be treated as a number while we proceed to solve the 
system of equations by standard methods. The solution will express each of 
the p;,,, a8 a polynomial in powers of E. Then to obtain a formula expressing 
p;,n aS an explicit function of n, we will have only to solve a linear difference 
equation with constant coefficients. 

If the form of the function g, is such that a;; + g, approaches an 
asymptotic value, 7;; , as n increases, then the asymptotic values, call them 
d; , of the p,,, can be obtained by solving simultaneously the system of r 
linear equations in the r unknowns ), , (7 = 1, 2, --- , 7): 


—(1 — ™11)Ai + Tie jt 203 Sad, = 0 
Ti2dy — (I — T2)Az + I TroNy = 0 
Tidy + T2rA2 + ee Are (1 pes Tr r)Ay = 0 


This system of equations has two properties of special interest, First, the 
asymptotic response probabilities \; are completely determined by the 
parameters 7;; . Second, the mean asymptotic probabilities of the reinforcing 
events are determined by the same system of equations. If we let 7; represent 
the mean asymptotic probability of E; , then clearly 


od fa Aim; + Ao; i AT; . 
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But inspecting the jth row in the equation system above, we see that 
d; = iT; oa dot; + —" a \,1,; e 


Therefore, 7; = 2, , i.e., asymptotically the mean probability of a response 
is equal to the mean probability of the corresponding reinforcing event. 
We have another example of the ‘probability matching”’ which has frequently 
been noted in studies of probability learning with simple, non-contingent 
reinforcement [3, 5, 8, 9, 13]. In the contingent case, there are no fixed en- 
vironmental probabilities to be matched by the subject, but the matching 
property again obtains when the stimulus-response system arrives at a state 
of statistical equilibrium. 


In the special case when g, = 0 for all n and a;; = z,; , the value of 
p;.n Will be given by an expression of the form 
(9) Pin =A 4+ Citi + Coma t+ --> + C121, 


where the absolute value of each of the 2; is in the range 0 S x; S 1, and the 
C; are constants whose values depend on the initial p; values and on the 7,; . 
It may be noted that all of the C; need not have the same sign, and conse- 
quently the curve of p;,, will not always be a monotone function of n. Some 
of the curve forms which arise are illustrated in Fig. 4; the curves in the upper 
and lower panels represent the same value of 6 but different combinations 
of x;; , viz., 


Upper panel Lower panel 
E, E, E; E, E, E; 
A, 33 .33 .33 38 .00 :.de 
A; .50. .50  .00 5 .0 .0 
A; Tt @ 8 .83 .00 .17 


It will be apparent from inspection of Fig. 4 that in this case, unlike the 
non-contingent case, not only the asymptotes of the learning curves but also 
the relative rates at which the curves approach their asymptotes depend 
upon the probabilities of reinforcement. 


e. Contingency with a lag 


The contingent cases discussed above cover the common types of ex- 
periments in which the probabilities of such reinforcing events as rewards 
or knowledge of results on any trial depend on the subject’s response on 
that trial. Now we wish to extend the theory to include the more remote 
contingencies which arise in games or similar two-person situations. In this 
type of situation it is a common strategy to make one’s choice of moves, or 
plays, on a given trial depend upon the choices made by one’s adversary on 
preceding trials. Regarding the first player as the experimenter and the 
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second as the subject, we can represent this kind of strategy in the present 
model by letting the probability of reinforcement of a given response on 
trial n depend upon the subject’s response on some preceding trial, say 
n — v. By the same reasoning used in the case of (2) and (6), we can write a 
difference equation for mean probability of response A; on any given trial: 


(») 


(10) Pi n+l - (1 uae 0)D; .n + 6 I ell ) 
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where 7;’,, represents the probability of reinforcement of A, on trial n + 1 
given that A, occurred on trial n — v. (10) is difficult to handle unless the 


functions 7°’, differ only by constant terms [i.e., m1 = Qi + g,(2), 
T. = A + g,(v), ete.]; for this case, (10) reduces to a linear difference 
equation with constant coefficients 

(10a) Di nti = (1 pe 0D; +n + tf] De Pines; + 6g.” , 


which can be solved explicitly. In order to exhibit some of the most readily 
testable implications of this model for experiments involving remote con- 
tingencies, let us consider the special case of two response classes and 75}, 
independent of n. Then for a given contingency lag v, the 7;;’,, can be treated 


as constants, and (10) reduces to 
“Mast = (1 = 0)D1 .n + O[p; .n—»Ti1 + (1 = Pi .n-v)To1] 
= (1 Fae 6)D1.n + O11 = 21) .n-» + Or. 


Now [excluding, as before, the case (7,, = 1 and 7., = 0) for which 
Pi.o = Pi,,] we can obtain the asymptotic probability of response A, by 
setting Pins1 = Pin = Pisn-» = Pi, in (10b) and solving, viz., 


(10b) 


T21 


ee lL = wy + 31 


We obtain the interesting prediction that asymptotic probability is independ- 
ent of the contingency lag v. The complete solution of (10b) is (ef. [12] for 
the detailed method of derivation and for the treatment of cases in which the 
characteristic roots are not all distinct) 


Tei 
? 
1 — m1 + Ta 


where the C; are constants which can be evaluated from the initial conditions 
of the experiment and the z; are the roots of the characteristic equation 


(10c) so ee Citi + C223 Hee + Catia + 


gz” — (1 = 6x" oP O(7 4, = To) = 0. 
Except for the degenerate case (7,;, = 1 and z., = 0), the characteristic 
roots will have absolute values in the range 0 S$ x < 1, and therefore x” 
will tend to zero as n increases. If the lag v is zero, then the characteristic 
equation is simply 
x — (1 — @) — Om, — m1) = 0 
which has the single root 
x=1— 60+ Om, — Om, 


and (10c) reduces to (7a) as it should. 
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If the lag v is 1, i.e., probability of reinforcement on a given trial depends 
on the response of the immediately preceding trial, the characteristic equa- 
tion is 


. .. (1 om 6)x = O(a 11 ose 21) = 0, 
which has the two roots 


1-— 6+ V(1 = 6)” + 46(7,, a 1) 
vy = 2 








and 





1-— 6-— V(1 — 0)? + 4001 — ma) 
2 





Le = 


The properties of the solution will depend on the relative magnitudes of 
™,, and 72; as follows: 

1. If 7,; = m2, , then x, and x, are equal to 1 — 6 and 0, 
respectively, and (10c) reduces to the (3a) of the simple non- 
contingent case. 

2. If 7, > m2, , then x, and z, are real numbers, positive 
and negative, respectively, with absolute values between 0 and 
1. Comparing the larger root, zx, , with the characteristic root 
for the case of lag 0, we find that the difference between the 
former and the latter is always non-negative when m, > mo ; 
i.e., 


1— 64+ Vil — 0) + 40(r, — 0) 
2 





a (1 ~ 0+ ty, - 6721) 


IIV 


0, 


and the equality holds only in the degenerate cases (0 0; 
™, = 1 and z,, = 0) for which (10c) is inapplicable. Thus it 
can be wesoaiieaia that when 7, > 72, , the mean learning curve 
will approach its asymptote more slowly for the case of lag 1 
than for the case of lag 0. 

3. If m4, < m2, , then neither x, nor x, is negative. Both 
x, and x, are real numbers in the interval 0 < x < (1 — 6) if the 
quantity 


(1 — 6)* + 40(r, — T 21) 


is positive; otherwise they are complex numbers with moduli 
in the interval 0 < |x| <1. 
In general the estimation of parameters from data will be difficult when 
there is a contingency lag. Tests of this aspect of the theory can be achieved 
most conveniently by obtaining estimates of 6 from data obtained under 
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conditions of simple non-contingent or contingent reinforcement and then 
computing predicted relationships for experiments run under similar con- 
ditions except for the introduction of contingency lags. Predictions about 
asymptotic probabilities are, of course, independent of @ and thus can be 
made in advance of any experiment. 


Interpretation of the Model 


The theory of reinforcement developed here might be characterized as 
descriptive, rather than explanatory. The concept of reinforcing event 
represents an abstraction from a considerable body of experimental data on 
conditioning and simple motor and verbal learning. In a number of standard 
experimental situations used to study these elementary forms of learning, 
it is possible to identify experimentally defined events or operations whose 
effects upon response probability appear to satisfy the quantitative laws 
expressed by (la) and (lb). The first task of our quantitative theory is 
simaply to describe how learning should proceed under various experimental 
arrangements when these particular experimental operations are assigned 
the role of reinforcing events. A second task, which becomes important once 
the theory has survived preliminary tests, is to facilitate the identification 
of reinforcing operations in new empirical situations. We can test hypotheses 
concerning a class of events termed reinforcers only if we can state detailed 
testable consequences of class membership. To the extent that the model 
elaborated here acquires standing as a descriptive theory, it will serve also 
to specify the quantitative properties which define membership in the class 
of reinforcers. Although a quantitative theory of this kind does not con- 
tribute immediately to an intensive definition, or interpretive account, of 
reinforcement, it does provide an additional research tool which may con- 
tribute to the construction and testing of explanatory theories. 
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A general function is derived describing the conditioning of a single 
stimulus component in a discriminative situation. This function, together 
with the combinatorial rules of statistical learning theory [5, 12], generates 
empirically testable formulas for learning of classical two-alternative dis- 
criminations, probabilistic discriminations, and discriminations based on 
the outcomes of preceding trials in partial reinforcement experiments. 


From the set-theoretical stimulus model developed by Estes and Burke 
[5], together with a descriptive theory of reinforcement [4], it is possible 
to derive a model for certain aspects of discrimination learning. This model 
is assumed to describe the formation and abolition of conditional relations 
(‘connections’) between response classes and the independently variable 
components or aspects of discriminative stimuli. It does not take account of 
patterning effects, “observing responses’ [13], adaptation of irrelevant cues 
[11], or many other complications, and thus is not expected to provide a 
generally adequate account of discrimination learning. The model does, 
however, describe the data of certain especially simplified experiments 
[1, 6, 7, 12]; these findings have been taken as evidence for the assumption 
that it may eventually form a part of a more comprehensive and adequate 
theory. 

Since our primary concern in this paper is with stimulus variables, 
we shall assume the simplest possible conditions relative to other types of 
variables. For all functions derived, the reference class of experiments will 
involve two stimulating situations which are presented in a random order, 
two alternative response classes with corresponding reinforcing events, and 
determinate, non-contingent reinforcement. t 


Definitions and Assumptions 
We shall require the following terms: 


A, and A, Mutually exclusive and exhaustive response classes. 
E, and E, Reinforcing events for A; and A, respectively. 


*The researches on which this paper is based were facilitated by a grant from the 
National Science Foundation. 

tBy ‘determinate, non-contingent reinforcement’? we mean that exactly one pre- 
designated reinforcing event occurs on each trial and that the probabilities of reinforcing 
events are not conditional upon the subjects’ responses, 
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T; and T, Types of trials (corresponding to the two stimulating situations 
which are to be discriminated). 
mj (with >>; x,j =1) Probability of E; on trials of type 7. 


Bandl1—8 Relative frequencies of 7 and 7» trials during a learning series. 

S; and S, Sets of stimulus elements available for sampling only on 7); and 
only on 7° trials, respectively. 

S. Set of stimulus elements available for sampling on all trials (i.e., 
elements common to the two stimulating situations). 

S Set of all elements associated with either of the two stimulating 


situations; i.e., 
S=8,US,US., 
where U indicates set-theoretical summation. 


Ni Number of elements in S,. 

6;; and 62 Sampling probability of the 7th element of S on trials of type 7; 
and 7’, , respectively. 

Fin Probability that the ith element of S is ‘connected to’’ response 
A, on trial n (i.e., at the beginning of trial 7). 

F; Limiting value of F;,, for large n. 

Phi.n Probability of A; , in the stimulating situation corresponding to 
T,, on trial n. 

Pn Probability of a “correct response” on trial n, where a correct 


response is defined as A; on 7; trials and Az on T- trials so that 


(1) Pn = BDi1.n + (1 nel B)po2,n . 

Our assumptions about learning are taken directly from previous treat- 
ments of learning in elementary, non-discriminative situations [4, 5, 8]. 
Specifically, we assume: 

a. At any time, each stimulus element in S is “connected to” exactly 

one of the responses A, or A, . 

b. The F;,,, remain fixed in value except when reinforcing events occur. 

c. If E, occurs on trial n, then 
(2) Fs on = (1 cna Oi) F 5 n-1 + Din ° 

d. If EF, oceurs on trial n, then 
(3) F, n = (1 bid Oi) F i 2-1 ° 
In (2) and (3), k = 1 or Ah = 2 according as the trial is of type 7’, or T, . The 
F,,, are brought into relation to response probabilities by the assumption 
that probability of a given response on any trial is equal to the proportion 
of stimulus elements in the trial sample that are “connected to”’ the response. 
Intuitively, a ‘connection’ is obviously intended to represent a learned 
association, but its formal properties are limited to those expressed above. 


Learning Functions for a Single Stimvlus Element 
General case 


Using the terms and assumptions given above, we can write a general 
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difference equation expressing F;,, , the probability that the ith element 
of S is connected to response A, on trial n, as a linear function of the prob- 
ability on the preceding trial. 


F; = B[(1 ag 635) F 5 .n-1 ate 6.711] + (1 a B)[C ac 6:0) F 5 n—1 + 0:21] 
= (1 — BO; — Oi. i BOj2) Fy n-1 + B67, + (1 cia 8) 02% 21 : 


(4) 


If the quantity F,,,_, on the right side of (4) represents a probability associated 
with a particular subject on trial n — 1, then clearly the quantity F,,,, on 
the left represents the expected value on trial », where the expectation is 
taken over the possible stimulus sampling outcomes and reinforcing outcomes 
of trial n — 1. If an experiment were repeated many times (or, equivalently 
if a population of subjects with like parameter values were run simultaneously 
through an experiment) there would result a distribution of values of F;,,_, , 
and therefore also of F;,, . For each value of F;,,_, , the conditional mean 
value of F,,,, is given by (4). Therefore the relation between the mean values 
of F;,,-, and F, ,, is also given by (4). We shall now suppose that this averaging 
has been carried out, and in the remainder of this paper we shall interpret 
F,,, and F;,,-; in (4) and all following equations as expected values for the 
population of all repetitions of a given experiment (or, equivalently, for all 
subjects having a given set of parameter values). With this interpretation, 
and with the restriction that the parameters 6,, and m,; are constant over 


trials, (4) can be solved. The resulting expression for F;,,, , readily verified 
by induction, is 

(5) Fix one F, vie (F; ~* F; ,){1 a BO;, ay (1 = 6)6;.)""", 

where 


, — BOnm1 + ( Be B) 0:21 
©) Ti fin + 2 — On 


The quantity [1 — 80;, — (1 — 8)6;.] is bounded in the interval 0 to 1, so 
F,., describes a negatively accelerated curve with F; as the asymptote. It 
will be apparent from inspection of (6) that F; is also equal to the conditional 
probability of E, on trials when the ith element of S is sampled; i.e., 


(7) F; ad Pr(F, | X; = 1), 


where X, is a random variable that takes on the values 1 and 0 according 
as element 7 is or is not sampled on any trial. 





Special cases 

Learning functions for simple acquisition, classical discrimination 
learning, and probabilistic discrimination learning are now obtainable as 
special cases of (5). 
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a. A discriminative situation reduces to simple acquisition if only one 
stimulating situation is ever presented. When we express this restriction by 
setting @ = 1 and dropping the h subscripts in our general discrimination 
function, (5) reduces to 


(8) F;,, = ™ — (wm, — F;,.)(1 — 6,)"". 


b. In a classical discrimination problem, the two stimuli to be dis- 
criminated (e.g., black card vs. white card, bright light vs. dim light, circle 
vs. triangle) together with background, or contextual stimulation, are 
represented by two stimulus sets. On 7’, trials, the set S, U 8S, is available 
for sampling, and on T, trials the set S, U S, is available. It will often be 
assumed that @ values are equal for all elements within each of the three 
subsets S, , S, , and S, . With this assumption, (5) becomes for S, : 


(9) F,.. =m. — Gr — F,.)(1 — 86)""; 
for S, : 

(10) F., = %1 — (1 — F2,)[1 — (1 — 8)6)""; 
and for S, : 

(11) F... = 4 — (ew. — F.)(1 — 6)", 
where 


ne B11 = 5 (1 a. B)mo1 J 


c. Traditionally, discrimination learning has been regarded as a matter 
of distinguishing two (or more) stimulating situations which differ with 
respect to certain component stimuli or stimulus attributes; and discrimi- 
nation theories have been limited to this paradigm. More generally, however, 
there is a basis for discrimination learning whenever some of the components 
or attributes of a situation bear non-random relationships to reinforcing 
events. If two “situations’’ to be discriminated include the same stimulus 
components, and differ only with respect to the sampling probabilities of 
the components, we shall speak of probabilistic discrimination learning. 
Just as partial reinforcement represents a natural generalization of simple 
acquisition and extinction procedures, probabilistic discrimination learning 
represents a natural generalization of the conventional discrimination 
paradigm. 

In the terms defined above, the condition for probabilistic discrimi- 
nation learning relative to the two “situations’’ sampled on trials of types 
T, and T, is simply that there be some difference between the distributions 
of @;, and 6;, . A simple arrangement which has proved convenient for ex- 
perimental tests of the theory [cf. 7] is the following. The reinforcement 
probabilities 7,, and z, are set equal to unity and zero, respectively; i.e., 
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response A, is always reinforced on 7’, trials and A, is always reinforced 
on T, trials. The stimulating situation includes N separably manipulable 
components. On 7’, trials, the sampling probabilities of the components, 


6;, for? = 1, 2, --- , N, are given by the linear function 

(12) 6. = a, + 1,2; 

and on T, trials the sampling probabilities, 6;. for? = 1, 2, --- , N, are given 
by 

(13) Bi0 = a, + byt. 


In this case, (5) becomes 


_ Bar + byt) _ E +bi) 4 | fc elas 
(14) F;, = a+ bi a+ bi Fe |G a os 


where 

ad = Ba, + (1 — B)a, 
and 

6 = Bb, + (1 — Bb, . 


Now if we set b, = —b, and 8 = 1/2, values which, in fact, were used in 
the experimental application referred to above, the asymptotic value of 
F, ,, is a linear function of @: 


F, = 5 (a + 6,7). 


This case is obviously advantageous for statistical tests of the correspondence 
between predicted and observed values of F; . A convenient estimator of 
F,, is the proportion of A, responses evoked by the ith stimulus component 
when presented alone. 


Learning Functions in Terms of Response Probability 


Given the formulas for F;,, , it is a purely mathematical problem to 
derive expressions for response probability. Two principal cases arise: (a) 
that in which one wishes to predict response probability in the presence of 
a specified sample of stimulus elements (as might be the case in an experiment 
on stimulus compounding or transfer); and (b) that in which one wishes to 
predict response probability when stimulus elements are being sampled 
randomly from a specified population (as might be the case in experiments 
on acquisition and reversal of discriminations). Case (a) is the simpler, since 
expected response probability will be given directly by the arithmetic mean 
of the F;,, , i.e., 
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< 1 
(15) Prin = K » Fin ’ 


where the summation is taken over the K elements in the given sample. 

In case (b) we shall follow the procedure used in our earlier treatment 
of simple learning [5] and replace the arithmetic mean of the F,,,, with the 
weighted mean: 


I 


(16) Phin as Nia > 63,5 i,n 


(the summation being taken over the set S,). Equation (16) does not, in 
general, give the exact value for the mean proportion of A,-connected elements 
in trial samples. It does give the exact value when the sample size is fixed, 
and it approximates the exact value for all cases in which elements are 
sampled independently and JN, is large. If N, is very small, (16) may not 
be a satisfactory estimator, but in this instance direct computation of ex- 
pected sample proportions will not be difficult. 

To illustrate the derivation of learning functions by means of (16), we 
shall treat a number of special cases which have arisen in experimental 
applications of the model. 


Acquisition with random reinforcement 
If an experiment involves but a single stimulating situation, the acqui- 
sition function can be obtained by substituting (8) into (16), viz., 


1 2 bes 
(17) Pixs = No + 6:[m. — (rm, — F;,,)(1 — 4) ‘] 
1 oe 
i = [7 se D1] No Ye 6,(1 = 9;) Rs 


In the derivation of (17), and in all ensuing derivations, we assume for 
simplicity that the values of F;,, and 6; are uncorrelated. The most interesting 
consequence of (17) is the prediction that asympotically response probability 
should approach the probability of reinforcement regardless of the dis- 
tribution of 6; values. If all of the 6; are equal, (17) reduces to the function 


Pw =™ — [a — Mall — 8)" 
used by Estes and Straughan [8], Neimark [10] and others to describe acqui- 
sition under non-contingent random reinforcement. 


Classical discrimination learning 


For discrimination experiments involving two stimulating situations 
which differ with respect to some of their components, learning functions 
can be obtained by substituting into (16) from (9) and (11) or from (10) and 
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(11), the former yielding probability of response A, on 7’; trials and the latter 
probability of A, on 7’, trials. Using the notation 6,,. for the mean value 
of @ over all elements available for sampling on 7’, trials, i.e., 

0. = N, 6, + N.9. 

+e N, + N. ? 


we obtain for probability of A, on 7’, trials 


1 


Pin = Wik... {N, 6, [a1 aed (m1, aa: F,,)(1 - B0,)"~"] 
4V1l+ceVite 


+ N.9.[a- 7a (x. ar F,,)(1 4 rh aaes : P 


Letting 
Ww, = Nh 
Ni +B +60 
and 
oO 
‘ MiseBive j 


this expression simplifies to 
(18) Pin = Wit + Weare — Wi — Prual(l — Bb)" 
— wie, — taal — 0)”. 
Similarly, for probability of A, on 7 trials, we obtain 
= Weta, + Wer, — Wola — Paia)[1 — (1 — 8)6]"™" 
— we, — Pall — 0)". 


The predicted asymptotic accuracy of discrimination is seen to depend both 
on the amount of overlap between the two stimulating situations and on the 
values of z,, and z,, . It has been customary in researches on discrimination 
learning to give uniform reinforcement in the presence of one situation and 
uniform nonreinforcement in the presence of the other, i.e., to let 7, = 1 
and r,, = 0. This restriction is clearly unessential, however; theoretically, 
better than chance discrimination will be possible whenever the values of 
m,, and m2, differ, provided, of course, that w, is less than unity. In a recent 
experiment conducted to test the theory, 7,, and 72, were set equal to unity 
and 0.5, respectively, and the empirical curves of p,,,, and p2,,, diverged 
in aecordance with theoretical expectation [6]. 


(19) Pein 


Probabilistic discrimination learning 


When the “situations” to be discriminated differ only with respect to 
the sampling probabilities of component stimuli, the discrimination curves 








140 PSYCHOMETRIKA 


for probability of A, on 7, and T, trials are given by (16) with h = 1 and 
h = 2, respectively: 


1 
(20) Pii.n i NA, > OF ion 
and 

] 
(21) zi.” NO, > 6:0F 5 on ’ 


F,.,, being given by (5) in each case. In general, better than chance dis- 
crimination will be theoretically predicted whenever there is any difference 
in the distribution of @ values on 7, and T, trials. When reinforcement is 
uniform and the @;, distributions are linear, as assumed in the derivation of 
(14), the asymptotic values of p,,,, and po;,, are given by 


(22) a 





re NA 648 

and 

1 (1 — B)(az + byt)(a, + 6,7) 
23 21.0 = ar= an 3 
@) Pare = Ng, 2 a + bi 
where 

= 1 " 

6, = N és (a, + b,2). 
In an experimental test of the theory [7], we have set b,. = —b, , a, = 0, and 


a. = (N + 1)b, . With these restrictions, 
6; = 6, = b,(N aa 1)/2 


and (22) and (23) reduce to 








onne 20 - a 
(24) Pi,o = NIN + 1) 2, (1 — BN + 1) + (28 — It 
and 
(25) Poi a ot — p) om wy " < ut 


~ NIN +1) (1 — BN 4+ 1) + (28 — Di’ 


and we have the curious result that asymptotic response probabilities are 
independent of b, , the slope parameter of the 6 distributions. For the special 
case of 8 = 1/2, these asymptotic expressions reduce further to 


2 . 
Pru.o = WV  * 
(26) NN + 1)? S 
_ 2N +1 
~ 3(N + 1)’ 
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and 
2 ahs 
Pu.» = WON + 1? LN + 1 — ss 
2) 2 a 
=~, (N+ 1 - 77] 
(27) NIN + 1)° % 
< 2 Ee +1)? — MIN +.1)(2N + | 
~ N(N + 1)? 2 6 
oe 
i 3(N + 1) 


Asymptotic probability of correct responding in this case is seen to vary 
from 1/2 to 2/3 as N ranges from one to infinity. 

One point of interpretation concerning our functions for probabilistic 
discrimination learning requires especial emphasis. The stimulus sampling 
probabilities 6;, may be associated either with the hypothetical elements 
of an experimentally homogeneous stimulating situation or with independently 
manipulable components of a situation. In the former case, the derived 
functions should be applicable as they stand. In the latter case, 6;, represents 
a product of two probabilities, i.e., 


Oin — 1,0. 


The parameter 6/, is the (experimenter-determined) sampling probability 
of the 2th stimulus component. The parameter @ is the (subject-determined) 
probability that any element associated with the ith component will be 
sampled on trials when the 7th component is present. In the experiment cited 
above [7] the N stimulus components were signal lights with sampling 
probabilities, 64, , prescribed by the experimental design. Since all of the 
signal lights were similar in physical properties, we assumed that the subsets 
of stimulus elements associated with the various individual lights all had the 
same value of 6. When this assumption is satisfied, the asymptotic values of 
F,; , and p,;,, are independent of 6, and therefore are predictable in advance 
of an experiment. (6) becomes 


v.- B00; 411 + (Ll — B)O0;omo1 
: B00;, + (1 — 806%. 





(28) 
= BOh imi + e aoe B) 04,1 
BO: + (1 ee 8) 6:2 





and by substitution into (16) 
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1 
Pri,o = N,b, rb 6:,F ; 


. 


1 , 
(29) = Ys 00%, :% 66:,F; , 


L yy 
= Nm OF 


where JN, is the number of stimulus components available for sampling on 
trials of type 7, , and 6 is the mean of their (experimenter-determined) 
sampling probabilities. The slope parameters of the discrimination learning 
curves will contain 6 as a factor; for example, the exponential term of (5) 
will become 


[1 71 B66}, = (1 ae 8)00,)""". 


The problem of estimating @ from experimental data in this case will be 
essentially the same as in the case of simple acquisition functions (cf., e.g., 
[1, 8)). 


Discriminations based on traces of reinforcing stimuli 


Numerous experiments concerning learning with partial reinforcement 
have suggested that when trials are sufficiently massed, the subject forms a 
discrimination based on the stimulus after-effects of reinforcement and non- 
reinforcement (see, e.g., [3], pp. 16-18). This type of discrimination learning 
should be expected to show up especially clearly if experimental arrange- 
ments prescribe a non-random relationship between probability of reinforce- 
ment on any given trial and the reinforcing outcome of the preceding trial; 
for an example of such an arrangement, (see [9]). In order to treat trace dis- 
crimination in terms of the present model, we shall assume that one set of 
stimulus elements, S, , is available for sampling on trials following reinforcing 
event FE, , and a second set, S, , is available on trials following occurrences 
of EZ, . For simplicity we shall limit our derivations to the optimal case in 
which S, and S, have no common elements. The parameters 7, and 7.2, 
will now be taken to represent probabilities of FZ, on trials following EZ, and 
E, occurrences, respectively; 6 will be the average probability of F, trials, 
and must satisfy the relation 

B= = im,+ (1 = 7) T1 
(30) 
— Se 
1 — mu + m2 
Substituting z for B in (9) and (10), and letting 6, = 0. = 0, we obtain 


(31) Fis = a (m1 = F,,)(1 = 6i)"~ 
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and 
(32) om te. ne Can ce F,,,)(1 we a(1 ae ay. 


From (31) and (32) we can compute expected probabilities of response A, 
following F, and E, trials by taking account of the incremental or decremental 
effect of the reinforcing event: 


Pia = (i — 6911) Fs n-1 + Om, 


(33) 
= m1 + 6m, ,(1 isd 711) — (mi a Pu,i(1 Pa 67r1:)(1 sai 6z)""”, 
Pawan = (1 — 0+ O21) Fo n-1 
(34) = ie. A(1 To1)To1 = (ro, = Par, [1 = A(1 = 721) | 


‘fl -— 01 — w]"’, 
and on the average 
Pin ssi TPrr,n + (1 ore t)Po1.n 


(1 pa ates 11) 


(1 ~~ tn + 721) 
— tr — Pir(1 — 6m,)(1 — 6z)"~? 
<9 (1 ra ) (troy sea Po 1)[1 Ee (1 ay Toi) |[1 oi a(1 = “yr. 





(35) =i A(1 — 111) Toy 


The gist of these results is that the probabilities of A, on trials following 
FE, and E£, occurrences should tend asymptotically to the conditional probabil- 
ities of Z, and F£, , respectively, plus or minus “correction terms’ which are 
smaller in magnitude than 6; and the average probability of A, should tend 
asymptotically to the average probability of F, , plus or minus a “correction 
term” which is smaller in magnitude than @. 

For purposes of experimental interpretation the functions derived here 
should be expected to apply when trials are sufficiently massed and the 
stimuli associated with E, and £, sufficiently distinct so that the assumption 
of no overlap between S, and S, is tenable. As trial spacing is increased, we 
should expect the communality of S, and S, to increase until with extreme 
spacing, they are no longer discriminable. Under the latter condition, the 
subject is in effect sampling a single stimulus population on all trials, and 
(33), (384), and (35) reduce to 


(36) Pun = t+ O1 — @ —  — pud(1 — 0)"" 
(37) Pan = — OF — (F — Pus)(1 — 6)", 
and 


(38) P-— =p, 1 - aie 


> 

3 
I 
eS) 
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Cases of intermediate spacing should be expected to fall between these two 
extremes. Explicit expressions for cases involving partial overlap between 
S, and S, can be derived by obvious extensions of the methods illustrated 
here. 


The Role of Component Models in Discrimination Theory 


How can we characterize the empirical adequacy of the model presented 
here as a theory of discrimination learning? At a minimum, it provides a 
quantitative account of data in certain experiments conducted under con- 
ditions especially designed to satisfy the simplifying assumptions of the 
model and predicts some new phenomena, notably those of probabilistic 
discrimination learning. More generally, the model appears to give a 
reasonable account of the development of differential S-R correlations 
under differential reinforcement, of the relation found in some situations 
between asymptotic accuracy of discriminations and stimulus overlap or 
“similarity,’’ and of transfer phenomena observed when the components of a 
discriminative situation are tested in new combinations following the develop- 
ment of discrimination (see, e.g., [7, 12]). The component model does not 
account for the fact that in some situations subjects, animal or human, 
are able to achieve essentially perfect discriminations between stimuli which 
have components or attributes in common. 

Formally our stimulus model represents the type of component-sampling 
model which, with minor variations in detail, underlies numerous con- 
temporary approaches to discrimination theory, e.g., those of Bush and 
Mosteller [2], Restle [11], and Wyckoff [13] as well as our own. Insofar as 
stimulus variables are concerned, our model has been elaborated in greater 
detail than the others, but we have not attempted to handle effects of work 
or attentional factors. The logical next step in our line of investigation must 
be to examine possible auxiliary hypotheses, for example those relating to 
“observing responses” or adaptation of irrelevant cues, in order to determine 
how the present limited theory may be most effectively extended to cover a 
broader range of discrimination experiments. 
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Solutions of the communality problem and of the problem of meaning 
of common and unique factors have been shown previously to depend 
intimately on certain relations with ordinary multiple correlation. To 
make these basic propositions more accessible, simple proofs of some of them 
are provided here, avoiding any matrix algebra. New results are also 
obtained, with no extra work, that extend the previously known propositions 
to a more general class of coefficients than that of communalities. 


For any population of individuals and any set of n observed variables, 
the following inequality is known always to hold. If p; is the multiple correl- 
ation coefficient of the jth variable on the n — 1 remaining ones, and if h? 
is a communality of the jth variable, then 


(1) pi Sh; (j = 1,2, +++ ,n). 


Inequality (1) has been established in several different ways (ef. [1], pp. 92-3; 
[2], p. 278; [3], p. 293; [7]). 

One important use of the inequality is in conjunction with the Spearman- 
Thurstone hypothesis that the given correlation matrix results from m 
common factors, where m is much smaller than n. An inequality equivalent 
to (1) is 


(a) ee © ae 


It has been shown in [6] that the arithmetic mean, over j, of the left member 
of inequality (a) satisfies 


1 Ql —h; m 
(b) - oe. 
nN j=1 1 — Pj n 
For example, if 50 tests have only 5 factors in common, so that m/n = .10, 


then according to (b) the mean ratio of 1 — h? to 1 — p; is not less than .90. 


*Revised from a paper written while on leave at the Center for Advanced Study in 
the Behavioral Sciences. 
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This implies that, on the average, h} is not more than .10 larger than .90p? . 
Thus, if p; = .50, h; is on the average not more than .55. 

As m/n gets smaller, the bound to the average degree of approximation 
of p; to h? improves; and as m/n — 0 (which will happen if the battery of 
tests is enlarged and the Spearman-Thurstone hypothesis continues to hold), 
it must be that p; — h; for almost all 7. Conversely, if trial communalities 
are computed which leave large discrepancies between the left and right 
members of (1), then either the trial communalities are erroneous or the 
Spearman-Thurstone hypothesis concerning the smallness of m/n must be 
rejected. 

Another use of inequality (1) is for studying the meaning and determinacy 
of factor scores (as distinct from factor loadings). Let 5; be the (nonnegative) 
difference between the left and right members of (1), 


(c) 55 = hi — pj (j = 1,2, -+° ,n). 


It has been shown in [3] that 6; is the variance of the difference between the 
jth unique factor scores and the errors of estimate of the jth observed 
variable from the multiple regression on the » — 1 remaining observed 
variables (i.e., the jth anti-image scores). Therefore, if p; is very close to 
h:, 6; must be close to zero; the jth unique factor scores must be essentially 
equal to the jth anti-image scores. This provides a unique meaning for and 
determination of ‘‘unique’’ factors in such cases. 

A third important context in which inequalities (1) and (a) loom large 
is where hj differs substantially from p; . It has been shown in [1] and [5] 
that the left member of (a) above equals r; , where r; is the multiple correlation 
coefficient of the jth unique factor on the n observed variables. Thus, if 
hi = .7 and p; = .5, then r; = (1 — .7)/(1 — .5) = .6. In such a case, only 
60 per cent of the total variance of the jth unique factor is linearly determined 
by the n observed variables. Consequently, the unique factor is hardly 
“unique.’’ Many different sets of unique factor scores can be found to yield 
the same communality of .7 for the jth variable and to satisfy all other 
conditions for unique factor scores (i.e., correlate zero with unique factor 
scores for k # j and with all common factor scores). Furthermore, to any 
one such legitimate set of scores there corresponds another which is equally 
legitimate for the same data and the same jth observed variable, yet the 
correlation between the two sets is only .20. Should r; equal .5 instead of .6, 
then two different “unique” factors can always be found to fit the data 
legitimately for the same jth variable, and yet correlate exactly zero with 
each other. 

This multiplicity of solutions for unique variables for the same obser- 
vations has been analyzed in detail in [5]. It is also shown there how a similar 
multiplicity of solutions holds for each common factor separately, given any 
fixed set of common factor loadings, when the left and right members of (1) 
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are substantially disparate for many values of 7. The widespread practice 
of trying to name or attach meaning to factors merely by studying factor 
loadings is clearly suspect if the same loadings can be derived equally well 
from radically different sets of factor scores. 

In view of the demonstrated importance of inequality (1) for the com- 
munality problem and for the problem of assigning meaning to common 
and unique factors, it may be desirable to make its proof more readily 
accessible. The purpose of the present paper is to provide a simpler and 
more general proof of (1), avoiding any matrix algebra. This may help 
clarify the communality problem. Furthermore, the proof will extend (1) 
to the case where the right member belongs to a more general class of co- 
efficients than communalities. Equality (c) above is but a special case of 
a formula of the same form that will be established for the more general 
deviation law. Its proof is what yields (1) as a corollary (cf. [3], p. 293). Simi- 
larly a simple proof of the formula for r; will be given in a more general context. 


The ¢-Law of Deviation 


It is possible to define a general law of deviation which is satisfied both 
by the errors of estimate from multiple regressions (anti-image scores) and 
by the unique factor scores of communality theory. This we shall call the 
¢-law, in distinction to other deviation laws defined in [4]. 

Let x; denote the jth observed variable, and 2;; the score of individual 
ton x; (j = 1, 2, ---, n). Without loss of generality for our purposes, the 
expected values (arithmetic means) of all variables concerned may be set 
equal to zero, 


(2) Ex;; = 0 (j = hye *** ,n). 


There are many ways of expressing each x; as the sum of two uncorrelated 
components. Let y; and z; be any two such components, so that 


(3) Lis SV Yi Hii, 
and 
(4) E 932i = 0 (j = 1, 2, Boe +. 


An immediate consequence of (4) and (3) is that 
(5) o;; = oy, + 03; (j= ys aie ,n). 


Of special interest is the class of cases where z; correlates zero with 2, 
whenever j # k, 


(6) EB 2; i: = 0 (j ¥ k3j,k = 1,2,:-- ,n). 
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Any set of components for the given x; which satisfy (3), (4) and (6) will 
be said to obey the ¢-law of deviation; z; will be regarded as the deviant 
portion of zx; and y; as the non-deviant portion. Condition (6) alone has been 
called the a-law of deviation [4], so the ¢-law is a special case of the a-law. 

Regression errors of estimate and unique factors both can serve as 
deviant components under the ¢-law. Other types of components than these 
also satisfy this law. However, the variance of errors of estimate of x; can 
never be smaller than the variance of any other z; under the law. Therefore, if 
n; is defined by 


(7) n; = dy o:, (j es 1, 2, i ,n), 
from (5) 
(8) p; < n; (j= 1, 2, wm: , Nn). 


Since h; will be found to be an example of an 7; , (1) is but a special case of (8). 


The Multiple Regressions 


Let w;, be the multiple regression weight of x, for predicting x; , and 
let p;; be the predicted value, or image, of 2;; , 


(9) Dic = =; Wei (j a i, 2, ae ,n), 


k=1 
it being understood that a variable is not used to predict itself, or 
(10) w;; = 0 (j = 1,2, --- ,n). 
If e;, is the error of prediction, or anti-image, of x,; , then 


(11) Lj5 ED Hy: 

As is well known, in order that 07, be a minimum among error variances 
from all possible linear estimates of x; from the n — 1 remaining 2, it is 
necessary and sufficient that e; correlate zero with each x, where k ¥ J, or 


(12) E ¢;;%; = 0 (j #£k:3j7,k = 1,2, --° ,n). 


A simple proof of this classical theorem is given on pages 155-154. 
Multiplying (9) through by e;; , taking expectations over 7, and using 
(10) and (12) yield the result that an image and its own anti-image are 


always uncorrelated, 
(13) E p;¢:; = 0 (7 me 4D, += 5M) 
Therefore, the set of all images and anti-images obeys the ¢-law, for (11), 


(12), and (13) are special cases of (3), (4), and (6) respectively, setting 


Pp; = y; ande; = z;. 
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Equalities and Inequalities for the ¢-law 


More important than the fact that multiple regression components 
satisfy the ¢-law is the role they play in setting bounds for any components 
satisfying this law. The following basic formulas will be established: 


(14) o%; = a + <n (Gj = 1, 2, =. we n) 
and 
(15) o.; = o>; + a (j = 1, 2, Pte , Nn), 


where y; and z; are any components satisfying the {-law. e; and p; are uniquely 
determined by the n observed 2; . 
First, note that not only e; , but any z; correlates zero with p; , 


(16) E p;2;; = 0 (j = 1, 2, ++ ,N). 


This follows from multiplying (9) through by z;,;, taking expectations, and 
using (10) and (6). Indeed, (13) is but a special case of (16). Therefore, 
from (4) and (16), 


(17) Ez(pii— yi) =O (fF =1,2,--- , nn). 
But from (3) and (11), 

(18) Pic Bis eye Cie 5 

so (17) is equivalent to 


(19) E 2; ;(2;; 7 €;i) i 0 (j — 1, 2, = » 


If we write the tautology 

(20) Ci = 255 + (e;; Pa] Zi), 

then (19) and (20) show that e; can be partitioned into two uncorrelated 
components, z; and e; — z; . Taking squares and expectations in (20) and 
using (19) yield (14). 


It is similarly true that 
(21) Vis = Dic + Yi — isd} 


thus y; is partitioned into two uncorrelated components, and (15) follows. 
Explicit proof will be left to the reader. 
From (14) and (15) follow the basic inequalities 


(22) o. e O:, ’ o>; Ss on; (J = 1, 2, eis ni). 


For a given value of j, the equalities in (22) will hold if and only if o7,_,, = 
: = 0. [It is of course always true that o?,_,, = o7,.,, by virtue of (18).] 


Cyji—p; uvi~pP 
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As is well known, the multiple correlation coefficient p; for x; from its 
regression on the n — 1 remaining observed variables is given by the formula 


(23) p; = o>; lor (j = 1,2,---, n). 
Thus, p; is but a special kind of 7; in view of (7). Dividing the second part 
of (22) through by o2, and using (7) and (23) yields (8). The equality in (8) 


for a given j will hold if and only if o° = 0 for that 7. 


pi~2i 
The Case of Common and Unique Components 


It remains to be shown that a communality hj is a variety of n; , and 
the task of proving (1) will be completed. 

The communality problem arises from the definition of unique com- 
ponents. A set of n variables u; (7 = 1, 2, --- ,n) is called by factor analysts 
(following Thurstone) a set of unique factors or components for the observed 
x; if and only if they satisfy the so-called 8-law [ef. 4] 


(24) E u;C,; = 0 (j,k = 1, 2, ate 5 ,n), 
where 
(25) Cii = Vii ma Uji ’ 


and also the y-law [ef. 4] 


(26) E u;tn; = 0 (j #k3j,k = 1,2, --- ,n). 


The c; are called the common parts of the x; , and the dimensionality of the 
c; has in the past been the usual point of departure for discussing the com- 
munality problem. Equality (24) says that all unique components correlate 
zero with all common components, while (26) states that the unique com- 
ponents are uncorrelated among themselves. The communality h; for 2; 
from a given set of unique components is defined as the ratio 


(27) hj =o:,/o:, (j= 1,2,--- ,n). 


Similarly, the uniqueness of x; is defined as o:,/0:; . 

The case of (24) when k = 7 shows that the u; and c; satisfy condition 
(4) of the ¢-law. Multiplying (25) through by u,; , taking expectations over 
7, and using (24) and (26) show that 


(28) E u,,2;; = 0 (Jj ~k;j7,k =1,2,--- ,n). 


Interchanging subscripts 7 and k in (28) does not change the equality, so 
the u; play the role of the z; in (6). Condition (8) is clearly satisfied by (25), 
so the proof that common and unique components obey the ¢-law is complete, 
with the u; playing the role of the deviant z; . Consequently, the c; are 
special cases of y; , and comparing (27) with (7) shows that hj is a special 
kind of 7? . Hence (1) follows from (8). 
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It might be remarked that the 6-law of deviation can be defined as a 
combination of the B- and y-laws [4]. Thus, a necessary and sufficient con- 
dition that a set of components be unique factors for the observed 2; is that 
it satisfy the 6-law. The above proof shows that the 6-law is a special case 
of the ¢-law. Any set of components satisfying the 6-law must also satisfy 
the ¢-law. But not all components satisfying the ¢-law need satisfy the 
6-law; for example, the e; and p; regression components satisfy one law but 
not the other. 


Minimizing and Maximizing Properties of the Anti-Images 


It is a curious fact that among all z; satisfying the ¢-law, e; has the 
largest possible variance, while 07, is the smallest possible variance of errors 
for estimating 2; as a linear function of the remaining n — 1 observed x, . In 
one context, o2, is a maximum; in the other, it is a minimum. 

It may be instructive to give a simple algebraic proof that condition 
(12) is necessary and sufficient for the minimizing regression problem, along 
the lines of the previous argument concerning the ¢-law. 

Let w* be an arbitrary set of real numbers (weights), except that 


(29) w*=0 (j=1,2,-°:,n). 
Let p* be defined by 


Ill 


(30) pit : WX: 

k=1 

and e* by 

(31) Cf = 235 — pi. 
Pp 


Thus, p* is an arbitrary estimate of x, as a linear function of the remaining 
x, . Given that e; satisfies (12), multiply (30) through by e;; and take ex- 
pectations over 7 to see that 


(32) E pie; = 0 (j = l, a , Nn). 


From (13) and (32) 
(33) Ee;(pji — pi*) = 0 (j = 1,2, -+* 59). 


But from (11) and (381), 
(34) Pitsoa pit = ei mae ar. 
so (33) becomes 


(35) E ¢;,(e* di e;;) = 0 (j sia ky 2, si ioe , Nn). 
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Then consider the tautology 
t ag . . 
(36) C55 — C35 + (e;* €;4)3 
this expresses e* as the sum of two uncorrelated components according to 
(35), whence 


(37) o., aa Ss + ae (J = ly 2, = og ’ n). 
Clearly then 
(38) 04," = Tos (j ss i, 2, sia n), 


the equality holding for any j if and only if o7,-_., = 0. Hence, (12) is both 
necessary and sufficient for o?, to be a minimum. 

This proof of (38) is not only simpler than the usual one which involves 
the partial differential calculus, but is more complete. Sufficiency of (12) 
has been established here as well as necessity, with exact formula (37) for 
what happens when the best linear estimates are not used. 


The Determinacy Problem of the Common and Unique Components 


The ¢-law alone does not lead to a unique definition for deviant com- 
ponents for the x; . Further restrictions are needed for this. If it is required 
additionally that y; be a linear function of the n — 1 2, for which k ¥ J, 
then the only possible y; and z; are p; and e; , respectively (7 = 1, 2, --- , n). 
This can be seen, for example, by regarding inequalities (22) and (38) simul- 
taneously, 


(39) 02; S Oe; s O:;" (j a 1, 2, —n , Nn). 


Only e; can be an e* and a z; simultaneously, since an equality in (39) holds 
if and only if the variance of the difference of e; from the component in 
question vanishes. 

On the other hand, if it is required instead that the z; be uncorrelated 
among themselves—implying that the 6-law holds—this does not pin down 
the components; infinitely many satisfactory sets remain, as is well known 
in common-factor theory. Furthermore, the various possible components 
are in general not linearly determined by the observations. 

If components obeying the ¢-law are not linear functions of the observed 
x; , they can be only estimated linearly from the observations. Let us now 
inquire into the nature of such regressions, and into the predictability of 
¢-law components. It again turns out that the special case of images and 
anti-images plays a central role. We shall prove that, if r; is the multiple 
correlation coefficient for predicting z; from the n x,—where now 2; is in- 
cluded in the regression—and if ¢,, > 0, then 





2 


(40) "; = a. ,/0:; (o., > 0). 





a ~ a ae 
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If o., > 0, it is always true that o,, > 0, from (22), so the denominator on 
the right of (40) cannot vanish. Only the case of non-vanishing a,, is of 
interest for our regression problem. If o,, > 0, then (40) shows that r; = 1 
if and only if ¢,, = o.,; , Which implies, as we have repeatedly seen from (14), 
that o,,-.,; = 0 or z; be essentially the same as e; . In other words, we shall 
have 

THEOREM 1. Among all possible deviant components satisfying the ¢-law 
and which have positive variance, essentially the only ones which can be linear 
functions of the n observed variables are the anti-images. 

To prove these results, explicit formulas for the regression of each z; 
on the n observed variables will be established. Let w,;, be the multiple 
regression coefficient of x, for predicting z; , and let 7,;; be the predicted 
value of 2;; , 


(41) = De pte: . 
k=1 


It will now be immediately seen that 7; is exactly related to e; by the formula 


2 
(42) Wii = = C3: (¢.; a 0). 
The right member of (42) is clearly a linear function of the x, , and so must 
be of the form (41). Then all we need to show according to the proof in the 
last section—which holds for any multiple regression problem—is that the 


errors of estimate z; — 7; resulting from (42) correlate zero with each 
predictor, analogously to (12), 
(43) E (;; — ©; ) Lei = 0 ‘3 k = iy 2, nae » @): 


When j # k, (43) obviously holds from (6), (12), and (42). So we need verify 
only the part where 7 = k. 

Multiplying (3) through by 2z,;; , taking expectations, and using (4) 
show that 


(44) Bz; ,2;; i o:, (j = 1, 2, ae i. 


As a special case of (44), since e; is a 2; , 


(45) Bet; =o, (7 = 1,2,-++ ,n). 


Multiplying (42) through by 2;; , taking expectations, and using (45) yield 
(46) EK 1; 5C55 = o:, (j = 1, 2, ee , 0). 


Then (43) follows from (44) and (46). 
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Squaring both members of (42) and taking expectations show that 


(47) Oe; = 6.,/€s; (j = 1,2, --> ,n). 


Analogously to (23), the multiple correlation coefficient for predicting 2; 
is given by 
(48) r; = 07,/07; (7 = 1,2, --- ,m). 
Therefore, dividing (47) through by o?, and using (48) yield (40). 

It may be noted that (42) and (40) imply that the w;, themselves in 
(41) can be related directly to the w;, of (9) by the formula 


(49) Ox. = —1W;x (j AF ite k = i, 2, pial t n). 
However, in contrast to (10) for 7 = k, 
(50) WO; ; =1; (j = Agi *?* ,n). 


While z; is uncorrelated with each x, for which k ¥ j, from the definition of 
the ¢-law, nevertheless w,, in general does not vanish when j # k, according 
to (49). The variables x, with which z; is uncorrelated act as “suppressor” 
variables in aiding x; to estimate z; . 

Formulas such as (40) and (49) have been established previously by 
several writers, largely by the use of determinants and/or matrix algebra, 
for the special case of unique factors (see [5] and the references therein). 
They hold more generally for any z; of the ¢-law. It is a remarkable fact 
that each of the infinitely many possible z; (of positive variance) have 
estimates from the n observed variables that differ from each other only by 
the constant of proportionality r? , according to (42) and (40). As r; — 1, 
or 7; — 2; , it must be that 7; — e; according to (42) and hence e; — z; . The 
only way for a deviant component (of positive variance) to be linearly 
determinate under the ¢-law is that it have the corresponding total anti- 
image as a limit [ef. 3]. 

Analogous considerations hold for the common components. We shall 
not go into detail here; the interested reader may wish to see the related 
discussion for the case of the 6-law in [5]. 


Compact Formulation of the ¢-Law 


In closing, it may be useful to state the ¢-law in a more compact form. 


For the given n observed 2; , a set of variables z; (7 = 1, 2, --- , ) will be 
said to satisfy the ¢-law if and only if 
(51) E 2; :%x: ca 55407; (J, k= 1, 2, ats ,n), 


where 6,, is Kronecker’s delta, 


(52) tam | — (j,k = 1,2, +++ ,n). 
0 j#k 
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(51) is equivalent to (3), (4), and (6) combined. Condition (6) is directly 
stated in (51), so all that is needed is to consider (51) for the case where 
j = k. Define y; by the identity 


(53) Yi = Vii — Fi - 


This, then, is equivalent to (3). Multiply (53) through by z;,; , take expec- 
tations over 7, and use (51) to see that (4) is satisfied. Hence, (51) is sufficient 
for (3), (4), and (6). That (51) is also necessary follows from (6) and (44). 

The fact that the whole concept of deviance is wrapped up in the 2; , as 
in (51), without direct reference to the non-deviant y; , may help explain diffi- 
culties attending the communality problem in common-factor analysis. 
Past attempts to solve the problem have largely focused on the y;—that is, 
the c;—rather than on the z; (u;). The dimensionality of the c; has often 
been taken as a point of departure, so that communalities have often been 
thought of in terms of reducing ranks of correlation matrices. But many 
aspects of the problem can be studied without considering dimensionality 
at all, as in the present paper. In particular, communalities can possibly be 
uniquely defined by considering only the ¢-law, and requiring only determinacy 
of the z; in the limit as n ©. For cases where such determinacy holds, 
it must be that hj is the limit of p; as n >. No preliminary considerations 
of rank are required for such a conclusion, but analysis only of the law of 
deviation involved. 

Implications of lack of determinacy have been rather completely 
analyzed in [5] for the 8-law, including the 6-law of common-factor theory. 
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A general least squares solution for successive intervals is presented, 
along with iterative procedures for obtaining stimulus scale values, dis- 
criminal dispersions, and category boundaries. Because provisions for 
weighting were incorporated into the derivation, the solution may be applied 
without loss of rigor to the typical experimental matrix of incomplete data, 
i.e., to a data matrix with missing entries, as well as to the rarely occurring 
matrix of complete data. The use of weights also permits adjustments 
for variations in the reliability of estimates obtained sa the data. The 
computational steps involved in the solution are enumerated, the amount 
of labor required comparing favorably with other procedures. A quick, yet 
accurate, graphical approximation suggested by the least squares pvc 
is also described. 


Since Thurstone first developed the scaling method of successive 
intervals, it has appeared in several essentially identical forms under various 
names, such as equal discriminability scaling [6] and graded dichotomies 
[1]. The procedure was first published as a psychological scaling method by 
Saffir [15], the basic rationale having been previously presented by Thurstone 
in his absolute scaling of psychological tests [16, 18]. 

The experimental procedure for the method of successive intervals 
requires n stimuli to be sorted into (k + 1) categories on some attribute 
continuum. This procedure yields a frequency distribution for each stimulus 
over several of the categories. The basic consideration in successive intervals 
scaling is whether or not these frequency distributions can be simultaneously 
converted to a common distribution, allowing unequal means and variances, 
on the same base line. The means of the converted distributions would then 
correspond to stimulus scale values and the standard deviations to what 
Thurstone has called “discriminal dispersions”’ [17]. Scale values for category 
boundaries are also obtained from the method of successive intervals, thus 

*This research was jointly supported in part by Princeton University, the Office 
of Naval Research under contract N6onr-270-20, and the National Science Foundation 
under grant NSF G-642, and in part by Educational Testing Service. 
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permitting estimates of the size of categories rather than assuming them 
to be equal as in the method of equal-appearing intervals [19]. 


Solutions to the Scaling Problem 


Successive intervals solutions for the n stimulus values have been 
suggested by Saffir [15], Guilford [9], Mosier [13], Bishop [2], Attneave [1], 
Garner and Hake [6], Edwards [3], Gulliksen [10], and Rimoldi [14]. Some 
of these articles also offer solutions for the n discriminal dispersions and the 
k category boundaries. The procedures vary in computational routine and 
with respect to certain restricting assumptions, but they are essentially 
equivalent. These procedures involve obtaining the proportion, p;, , of 
times stimulus 7 was placed below the gth category boundary, ¢, , and these 
cumulative proportions are then usually converted into normal deviate 
values, z;, . Various successive intervals solutions presented in the literature 
have used normal curve transformations, but any similar function giving a 
one-to-one correspondence between p;, and z;, could be used (see [10]). 

Category boundaries on the attribute scale may be expressed in terms 
of normal] deviate values as follows: 


(1) t, = m,; + 82, 5 
where z;, = the normal deviate value corresponding to a cumulated 
proportion, 
t, = the upper boundary of the gth category, 


the scale value of stimulus 7, and 
the discriminal] dispersion for stimulus 7. 
This equation is what Torgerson calls a special case of the Law of Categorical 
Judgment [20]. Algebraic solutions for m; , s; , and ¢, can be obtained from 
this relationship by arbitrarily choosing one of the s; values as a unit and 
one of the m, values or their average as an origin. 

Gulliksen [10] derived an explicit least squares solution for m; , s; , and 
t, by minimizing the following error term: 


(2) D> ae i +82, — t,)’, 


i=1 g=1 


mM, 


8; 


where b is an arbitrary scale factor. 
The following restrictions, which fix an origin and a unit, were attached 
to the function to be minimized by Lagrange multipliers: 


(3) Dt, = 


g=1 


and 
k 


(4) > & = ka’ + 0’), 


g=1 





i=) 


aS OD OD =~ 


~ Oo Ob @ 
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where k is the number of category boundaries, i.e., one less than the number 
of categories. These restrictions place the mean scale value of the category 
boundaries at a and their standard deviation at b. 


A General Least Squares Solution 


This paper is concerned with a generalization of Gulliksen’s successive 
intervals solution [10]. In an attempt to obtain a least squares procedure that 
would apply equally well to data with either complete or incomplete overlap, 
a weighting system was incorporated into the present derivation. Thus the 
‘incomplete overlap” situation could be easily handled without loss of rigor 
by assigning weights of zero to the missing entries. As it turned out, the 
present iterative solution also involves a computational routine that is not 
excessively laborious and for which punched card procedures are appropriate 
[12]. 

The derivation proceeds by minimizing the following error term: 


i y 1 n k : 
(5) E = BP ya w;(m; + s.2;,, — t,), 
a#=1 g=1 

where w,;, is a weight that may be chosen in any fashion as long as w;, = 0 
and w;,2;, = 0 when p = Oor p = 1. 

An arbitrary origin and unit were specified by setting the weighted 
mean scale value of the category boundaries at a and their standard deviation 
at b: 


(6) DD (t, an a) ps Wig = 0, 
(7) ¥ (, - a? Sw, = we, 


g=1 i 


n k 


where W= > dw, . 


t=1 g=1 


These restrictions generalize for the weighted case the definitions used by 
Gulliksen in the unweighted case, see (3) and (4). Since the #-scale is de- 
termined only within a linear transformation, a and b may be set at any 
values desired, e.g., a convenient possibility for the origin, a, might be zero 
and for the scale unit, b, which must be positive, might be unity. 

Using two Lagrange multipliers, y and 2X, the restrictions setting origin 
and unit may be included in the error term as follows: 


(8) Q= ‘3 a 3 W:(m; + 82: — t,)” + an( > t, ba Wig — Wa) 


t=1 g=1 g=1 t=1 


me {> t; > w:, — Wa’ + |. 


g=1 i 
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Except for the weights, w;,, (8) is identical to the term minimized by 
Gulliksen, see (2), (3), and (4). His solution, then, is the special case of the 
present one in which all of the weights are equal to unity and >“! w,, = k 
and >>" w;, = n. Because of this restriction to unit weights, only data with 
complete overlap could be considered. Equation (8) is also similar to the 
term minimized by Tucker [22] in developing a least squares solution to the 
normal ogive model for categorical data, which is formally equivalent to the 
successive intervals situation (see [21], chapter 13). Tucker’s solution, like 
the present one, involves weights and is iterative, but instead of minimizing 
the sum of squared differences between theoretical and estimated f-values, 
he minimized the sum of squared differences between theoretical and observed 
2-values. 

The differentiation of Q with respect to each of the m; in turn yields the 
n equations 


10a 15 ene eer 
(9) os adit v(m; +sz2,—-t)(+1) G=1 n). 


After expanding and setting the partial derivative equal to zero, the solution 
for m; can be written as 


. 

7 Wists — 8; a Wikio 

(10) m; = — Sag 
> w,, 


Q is now differentiated with respect to each s; in turn to yield the n 


equations 





1aQ_ 1 
3 ae . ae oh. 
(11) an 2 w,(m; + 8,2;, — t,)(2:,) @ = 1 n) 


Expanding, setting the partial derivative equal to zero, and substituting 
the value of m; from (10), 


_ (Emote) Ewe) = (Ewe) E wat) 


(12) 8, i a A sea 1 ll 


(x wis) d w.) - (> Wastin) 


For the purpose of parenthetical comment upon the form of s,; , (12) 
can be rewritten in the following manner: 


Lou - b)(2io — 2) 
(13) i a erage 
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where 


k 
=. Wig@ig = Wot, 
9 


z= and i, = 4-——- 


Zz Wig zs Wig 
9 


It is apparent from the form of (13) that s; is the slope of a regression line. 
It is the coefficient for the regression of ¢ on z. This immediately suggests a 
graphical representation of the data, which will be presented in a later section. 
It is also interesting to note that Tucker’s solution [22], which minimized 
the sum of squared errors in the z direction instead of in the ¢ direction, 
involves the other regression—the regression of z on f. 

Utilizing the value of s; obtained in (12), it is also possible to rewrite 
the formula for m; in terms of ¢, as follows: 


k k k k 
(> wale) watis) = (x Wiob2eo)( Wutie) 


(14) m, = nl 


k k k 2 
(x watts) ye v) = ( a wid) 
g 9 9 


As will be seen in a later section, this is the form of m; which it will be eon- 
venient to use in computational routines. 
Differentiating Q with respect to each /, in turn yields the k equations 








- 1 1 n n n 
(15) 2 caer? = b? Zz Wim; Sg SiZig — t(— 1) + nN 3 Wig — vi, p & Wig 
(g=1---h). 
Expanding and setting the partial derivative equal to zero, 
(16) ‘ ‘ Wig — de wi,(m; + 8:2;,) + br bi w;, — byt, pm Oe = 


Define 
n 
i wm; + 8,2;,) 


(1 7) Vg ae n . ee 


n 


Rearranging the terms of equation (16), dividing both sides by >>" w,, , and 
substituting the definition of v, from (17), 


(18) (1 — b’y)t, =v, — Dr. 
The solution for /, can now be written as 
(19) pw 
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Summing (16) over g and utilizing the definition of (6), 
(20) Wa — ee w;(m; + 82:,) + Wo’ — Wab*y = 0. 
a 
Consider the term }>* 5°" w;, (m; + s8,2,,). By interchanging the order of 


summation and by inserting the values of s; and m; given in (12) and (14), 
respectively, this term may be written as 


n k k 
: (m, re Wig + 8; : wicte) 
t g g 


(Small t)(da,) - (baa 


(21) > : : 
Beat lE=.) -(Beee 





J 


Utilizing the definition of an origin given in (6), the term can be further 
simplified to 


n k n k 
(22) bebe w:(m; + 8,2;,) = ee w,,t, = Wa 


Now, from (20), 
(23) A = ay. 


It should be noted in passing that (22), in terms of the v, of (17), indicates 
that the weighted mean of v, is 


De, > Wig 
(24) j= >>) ra . a. 


If the above value of \ is substituted into (18) and (1 — b’y)a is sub- 
tracted from both sides of the equation, then (18) becomes 


(25) (1 — b’y\(t, — a) =v, — a. 


Squaring both sides of (25), multiplying through by >>" w,, , and summing 
over 9, 


k n k n 
(26) (1 — by)” Dt, — a)” Dw, = D&, — @) Do w,,. 


Utilizing the definition of a scale unit from (7), (26) becomes 
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z. (v, eas a)” > Wig 
(27) (1 — b*y) = . , 


Wb’ 








Using the value of \ found in (23), the solution for ¢, given in (19) may 
be written as 


_v,— aby v,-—a 





+ a. 


Substituting the value of (1 — b’y) from (27), the solution for ¢, becomes 








(29) +a. 





" v,—a 
[Es a a)” dw, 
Wb 


It can also be shown that EZ, the sum of squares of errors, may be repre- 
sented in terms of y as follows: 


(30) E = Wb’y. 


The Iterative Procedure 


Since the ¢-scale of successive intervals is determined only within a 
linear transformation, the origin, a, and the scale unit, b, may be set at any 
values desired. The values most convenient for computational routines are 
a = O and b = 1. With such a placement of origin and unit, the restrictions 
given in (6) and (7) may be restated as 


k n 
(31) b a FY w,;,, = 0, and 
k n 
(32) b aE >» w;, = W, _ respectively. 


The formula for category boundaries given in (29) may also be rewritten in 
terms of this origin and unit as follows: 


(33) t. = a ae 
W » v; > Wi 


Now, (12), (14), (17), and (33) may be used to set up an iterative pro- 
cedure to obtain convergence values for s; , m; , and ¢, . If some 
initial estimates, ¢,, , of the category boundaries were available, (12) could 
be solved to obtain initial estimates, s;, , of the discriminal dispersions. Of 
course, the initial ¢-estimates, ¢,, , should first be converted to meet the 
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restrictionsfof (31) and (32). Thus, if some set of k numbers, v,; , is available 
to estimate the category boundary values, before they may be used in the 
above solution, they must first be converted to meet the restrictions of (31) 
and (32) as follows: 


(34) i =. oS 


where 





- ] k n 1<2 ah n 

Sr De a ee en Ee 
9 + g a 

Initial estimates of s, can then be obtained from (12) as follows: 


k k k k 
(x wistaten) D w) a (x wate) D Warts) 


(35) 8, =- £ 











Eet\En)— (Bead 


If a subscript @ is introduced to indicate the ath cycle in the iterative pro- 
cedure, (35) can be rewritten as a formula for the ath estimate of s; : 


k 


k 
(36) Sia = A, a W ighy ak ig a B; >. Wigtya ’ 


g 


where 


(37) Shad pected teh windiest 
(x watt vw.) ea (x wii) 


9g 





and 
k 
2, Wikis 
9 


(38) B, = — ——t =: 
(x wise) >» v) a (> wate) 


Having found estimates for /, and s; , the ath estimate of the scale values 
may be obtained from (14) as follows: 


k 


k 
(39) Mi a ws C; Z. Wigloa a B; 2. Wistratig ? 
g g 


where 
: 


Ys Ww; Bis 





(40) Ce 8 


| ( x wat )(E ve) (x wit) 


9 9 
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Note that the components of (37), (38), and (40) are obtainable directly 
from the data; they are the same for all cycles of iteration and need be 
computed only once for the entire procedure. New estimates of v, may be 
obtained from a formula analogous to (17): 


n n 
zx WigMia + i W Si a% ig 
i i ee 
n 
2d Wi, 
1 





(41) Vg(a+1) = 


A new estimate of the /-scale may now be found by using (33) as follows: 


Ug (a+1) 


rx. ———= 
W X Vg(a+1) x Wig 





(42) t5(a+1) 


The above procedure may be repeated by inserting this value of é,, +1) 
into (36) to obtain s,;,2::) , Which in turn may be used to obtain m;,«+1) 
and, subsequently, t,,4+2) . This cycle may then be iterated until two successive 
estimates of ¢, are as similar as desired, i.e., until [t,,0+1) — ta] is negligible. 

The one step remaining to be considered before the above iterative 
procedure may be applied in practice is the initial estimation of the ¢-scale. 
One obvious starting point might be a set of equally spaced numbers, such 
as the integers from 1 to k, to which the conversion of (34) had been applied. 
By using such equally spaced ¢,, values, a set of “equal-appearing”’ intervals 
is used as the starting point for iteration to successive intervals. It may be 
possible, also, to increase the rate of convergence in the iterative procedure 
by doubling or tripling the difference between successive f-estimates, i.e., 
instead of using ¢,,4+;) on the (a + 1) trial, use t).a41:) = bya + (tocar) —bya)- 

A cycle or two, and in some cases perhaps several cycles, may be elimi- 
nated from the iterative procedure by using a computationally simple linear 
solution for ¢, as a first estimate. One of the simplest methods for estimating 
t, has been suggested by Garner and Hake [6] and by Edwards [3] and involves 
averages of successive differences in z;, values. Such averages are estimates 
of (t, — t,-,) provided the discriminal dispersions may be assumed equal. 
Torgerson [21] also gives a simple algebraic ratio solution for ¢, which does 
not require equal s; . Any of these algebraic solutions may be used to obtain 
initial estimates for the iterative procedure, but the labor involved might 
turn out to be as great as that in the cycle or two eliminated. 

Some comment is appropriate at this point concerning the weights, w,, , 
involved in the above least squares solution. It will be recalled that the only 
restriction placed upon the choice of these weights was that 


wW;,, = 0 and w,,z; = 0 when p=0 or p=l1. 
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If p equals neither zero nor one, the weights may be set at any values desired, 
e.g., W;, may be set equal to unity for 0 < p < 1. 

Another possibility would be the Miiller-Urban weights, x’/pq, where 
x is an ordinate of the normal distribution corresponding to a proportion p 
and q = 1 — p;a recent derivation of these weights is given by Finney [4]. 
The Miiller-Urban weights are particularly appropriate for successive 
intervals data, since they have the combined virtues of weighting directly 
in proportion to the rate of change of p with respect to z and inversely to the 
variance of the proportions [8]. Since the reciprocal of the variance may be 
identified with quantity of information [5] and is directly related to the 
reliability of a pfoportion, these weights would also be directly proportional 
to reliability and to the information available from the observations. It can 
also be shown that within an approximation the Miiller-Urban weights are 
the proper values for weighting normally transformed scores inversely to 
their variance (see [11], p. 206). 

However, it is possible to use a simpler set of weights than x’/pq, without 
sacrificing completely the differentiation between reliable and unreliable 
z;, values. For instance, one possible rule for weighting would be to assign 
zero if the corresponding proportion contained less than some specified 
fraction, 1/r, of the maximum possible information (corresponding to 
p = .5) and unity if it contained more than (1/r)th the maximum information. 
Or, all | z,;, | > c could be weighted zero, and all | z,, | < ¢ could be weighted 
unity; such a rule with a value of c = 2 has been found to be convenient in 
practice [7, 8]. The use of a simple set of unit and zero weights also simplifies 
some of the procedures involved in the above iterative solution. 


Summary and Illustration of Analytical Procedures 


The analytical procedures involved in the above least squares solution 
for successive intervals will now be summarized, and an errorless numerical 
example will be used to illustrate the computational routine. 

1. The experimental method of successive intervals yields the category 
(1 to k + 1) into which each of n stimuli was placed by each of N individuals. 
These data may be summarized into an n X (k + 1) table, the cell entries 
of which, f;, , represent the number of times the 7th stimulus was placed in 
the gth category. By cumulating the frequencies in each row of this table so 
that each entry now represents the number of times the 7th stimulus appeared 
below the gth category boundary, ¢, , a set of cumulated frequencies, F,,, , is 
obtained, which can be considered to be the starting point for successive 
intervals analysis. 

2. The cumulated frequencies are then converted into proportions, 
p:, , and then to normal deviate values, z;, . For the purpose of illustrating 
computational procedures, consider the set of z;, values presented in Table 
1. The four scale values, m; , four discriminal dispersions, s; , and three 
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TABLE 1 


Data for an Errorless Numerical Example, along with 
"True" Scale Values and Category Boundaries 














"4g Wig a “g 
Stimu- Category 
us 1 2 5 
1 0.0 2.0 5.0 2 1 0 -1.00458 «53229 , = 71.064 
2 -0.5 0.5 2.0 - a | ~ 535229 1.064 t, = 0 
3 26° 0:0 Hide ee 0 322 t, 1.59687 
” -2.0 -1.0 0.5 ye Se 1.06458 .06458 





First Iteration in the Numerical Example, Beginning with Equally Spaced t_ 
6 


TABLE 2 


1 


169 


























Stim- to A, C; ad Pat iw S51 my t,2 
lus & & 
x ty, * -1.16422 -3750 +2500 =.5000 -2.17/321 -5104€ 65972 1.16422 tio -1.08759 
toy 2 2381 +0952 = 2381) + 54329 0-44 26883 1.06813 --53575 too -03360 
ts) = 264 -0784 +0196 = «2549 -62093 6.75251 5172 02593 tso 1.57282 
4 -2128 4 =+.1277 = «2766 = 2.09562 += 3.49267 1.0108 1.02566 
TABLE 3 
Second Iteration in the Numerical Example 
s ami< Sw g2 Wig ge71¢ Sin iD g 
o— So 
1 -2.142 6720 60 -1.08 13 -1.06948 
2 = 53516 4.26683 1.0668 - 2 tos = -00750 
3 55243 6.89364 52963 00 ta 193 
4 2.1252 5.68080 1.05467 0578 





category boundaries, ¢, , which exactly fit these z;, values under the re- 
strictions of (31) and (32) are also given in Table 1. Knowing a “true’’ set 
of scale values and category boundaries, the convergence of the above 
iterative solution may be illustrated. 

3. A set of weights, w,;, , and an initial estimate, ¢,, , of the category 
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boundaries must now be determined. For the present example, it was decided 
to use the weights given in Table 1; they were assigned so that 


[0 for | 2. | > 3.0 
Wi, = 7 for 3.0 > |z,, | > 2.0. 
» for | zi, | < 2.0 


It was also decided to use an equally-spaced scale as a first estimate 
of ¢, . Accordingly, using the integers 1, 2, and 3 as v,, , the conversion of 
(34) produced as ¢,, the values — 1.16422, .15523, 1.47469. It should be noted 
that if the “true” ¢, values given in Table 1 were used as first estimates in 
the present iterative procedure, they would be exactly reproduced at the 
end of one cycle. 

4. Now, the coefficients A; , B; , and C; may be computed according 
to (37), (38), and (40), respectively; these values are presented in Table 2, 
along with the values of 


k k 
¥ Wiglos and - Wiotorzio a 
9 g 


5. Sufficient information is now available to solve for first estimates of 
the discriminal] dispersions, s;, , using (36) (see Table 2). 

6. Now, first estimates of the scale values, m;, , can be obtained, using 
(39) (see Table 2). 

7. New estimates, ¢,. , of the category boundaries may now be found 
from (41) and (42). It will be noted that ¢,. given in Table 2 is closer to the 
“true” ¢-scale than ¢,, was. 

8. In order to iterate this solution, new values of >>‘ w,,t,. and 
>! wiot,22:, must be computed. Using these values, (36) and (39), respectively, 
may be solved to obtain s;, and m;, . Then, (41) and (42) may be used to 
obtain ¢,, , the third estimate of the ¢-scale (see Table 3). It should again be 
noted that the estimates of ¢, at each successive cycle are approaching closer 
and closer the “true’’ ¢, scale. This procedure may now be repeated until 
two successive /-estimates are as similar as desired. 


A Graphical Successive Intervals Scaling Procedure 


It was seen from (13) that s; is the regression coefficient for the regression 
of ¢t on z. This suggests a graphical solution for successive intervals, which 
will be summarized below; the procedures to be presented bear some similarity 
to the graphical methods of Mosier [13] and Garner and Hake [6]. 

1-3. Steps 1 through 3 of the graphical procedure are identical to the 
corresponding steps of the above analytical procedure. In order to utilize 
the graphical method, a first estimate, ¢,, , of the category boundaries must 
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be available, along with normal deviate values and their corresponding 
weights. 

4, The estimated ¢,, values are then marked off as the ordinate of a 
graph with z values used as the abscissa. The ¢,, values are horizontal lines 
that hold for all stimuli, so several plots can be made on one graph (see 
Figure 1). For each stimulus, the z;, values are plotted at the appropriate 
t,, points, i.e., for stimulus 2 in the above numerical example, points would 
be plotted at (¢,, = —1.164,z = —.5), (t,, = .155, z = .5), and (¢,, = 1.475, 
z = 2.0), as illustrated in Figure 1. Weights can be applied in the graphical 
a eee 
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Z5-2 Z3-| Z-0 ZI Z=2 Z=3 


FIGURE | Zz Scale 


Graphical Solution for Numerical Example 
Beginning with Equally Spaced ty 


procedure by clustering around each point a number of dots proportional 
to the corresponding weight. A straight line can now be fitted to the points 
by eye, giving more emphasis to those points with bigger dot clusters in 
determining the slope of the line. 

5. The equation of each of these lines can be written as 


(43) toa = Mia + S8iakig - 


The slope, s;. , of each line is the ath estimate of the discriminal dispersion, 
and the intercept, m;. , When z = 0 is the ath estimate of the scale value. 
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These intercepts and slopes can be read directly from the graph, but they 
need not be recorded until the final iteration. 

6. In practice, the straight lines fitted to the plotted values will rarely 
cross every point; each point will usually deviate from the line by some 
amount, the amount of this deviation in the vertical direction representing a 
scaling error (see Figure 1). The vertical projection of a plotted point on the 
fitted line produces another point, é;, , which, since it lies directly on the line, 
represents a theoretical or fitted estimate of ¢, (see Figure 1). For a given 
category boundary, there are n fitted estimates ?;, , one for each stimulus. 
If the ordinates of these /;, values are recorded, weighted averages of the 
ordinates can be used to obtain a new estimate of ¢, as follows: 


n 
> a Wot, 
J 


(44) - a 
D Wie 
F 
~ Vola4 gi Oh +1 
(45) toasty) = SR — 
Ov (a 41) 
where 
1 k n 
ew =e Dove De 
9 1 
and 


ig k — a eo 
Ox(at+1) = VW p> W041) — en dvi, ° 


The only value, then, that need be read from the graphs in going from one 
iterative cycle to another is é;,, . The slopes and intercepts corresponding 
to discriminal dispersions and scale values do not need to be recorded until 
the final iteration. 

7. This new estimate of ¢, may now be plotted as the ordinate of a graph 
with z values marked off as the abscissa. The cycle may then be repeated 
beginning at step 4 until two successive ¢-estimates are as similar as desired. 


REFERENCES 

[1] Attneave, F. A method of graded dichotomies for the scaling of judgments. Psychol. 
Rev., 1949, 56, 334-340. 

[2] Bishop, Ruth. Points of neutrality in social attitudes of delinquents and non-de- 
linquents. Psychometrika, 1940, 5, 35-45. 

[3] Edwards, A. L. The scaling of stimuli by the method of successive intervals. J. appl. 
Psychol., 1952, 36, 118-122. 

{4] Finney, D. J. Probit analysis. New York: Cambridge Univer. Press, 1952. 

(5) Fisher, R. A. Theory of statistical estimation. Proc. Cam. Phil. Soc., 1925, 22, 700-725. 








G. W. DIEDERICH, S. J. MESSICK, AND L. R TUCKER 173 


[6] Garner, W. R. and Hake, H. W. The amount of information in absolute judgments. 
Psychol. Rev., 1951, 58, 446-459. 

[7] Green, B. F. Attitude measurement. In G. Lindzey (Ed.), Handbook of social psy- 
chology. Cambridge, Mass.: Addison-Wesley, 1954. 

[8] Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1936. 

[9] Guilford, J. P. The computation of psychological values from judgments in absolute 
categories. J. exp. Psychol., 1938, 22, 32-42. 

{10} Gulliksen, H. A least squares solution for successive intervals assuming unequal 
standard deviations. Psychometrika, 1954, 19, 117-139. 

[11] Kendall, M. G. The advanced theory of statistics. II. London: Griffin, 1948. 

[12] Messick, S., Tucker, L., and Garrison, H. A punched card procedure for the method 
of successive intervals. Princeton: Educational Testing Service, Research Bulletin 
55-25. 

[13] Mosier, C. I. A modification of the method of successive intervals. Psychometrika, 
1940, 5, 101-107. 

[14] Rimoldi, H. J. A. and Hormaeche, Marceva. The law of comparative judgment in 
the successive intervals and graphic rating scale methods. Princeton: Educational 
Testing Service, Research Bulletin 54-5. 

[15] Saffir, M. A comparative study of scales constructed by three psychophysical methods. 
Psychometrika, 1937, 2, 179-198. 

[16] Thurstone, L. L. A method of scaling psychological and educational tests. J. educ. 
Psychol., 1925, 16, 433-451. 

[17] Thurstone, L. L. A law of comparative judgment. Psychol. Rev., 1927, 34, 424-432. 

[18] Thurstone, L. L. The unit of measurement in educational scales. J. educ. Psychol., 
1927, 18, 505-524. 

[19] Thurstone, L. L. and Chave, E. J. The measurement of attitude. Chicago: Univer. 
Chicago Press, 1929. 

[20] Torgerson, W. S. A law of categorical judgment. In L. 8. Clark (Ed.), Consumer 
behavior. New York: New York Univer. Press, 1954. 

[21] Torgerson, W. S. Theory and method of scaling. Social Science Research Council 
(to be published). 

(22) Tucker, L. R. A level of proficiency scale for a unidimensional skill. Amer. Psy- 
chologist, 1952, 7, 408 (Abstract). 


Manuscript received 5/10/56 


Revised manuscript received 10/25/56 


2 #i? 


YP. ¥ f4> BRI HILAR! 


bi? ¥: 








PSYCHOMETRIKA—VOL. 22, NO. 2 
JUNE, 1957 


ON THE APPLICATIONS OF 
THE METHOD OF ABSOLUTE SCALING* 


CuunGc-TEH FAN 
EDUCATIONAL TESTING SERVICE 


Empirical and fictitious examples are described for investigating the 
applications of the absolute scaling method for item scaling and score scaling. 
A discrepancy between the correct values and the values estimated through 
the absolute scaling method is demonstrated. It is concluded that when the 
groups are different the assumption of an identity between test score con- 
version and item difficulty conversion is not met. 


Before investigating the applications of the absolute scaling method 
for item sealing and score scaling a brief explanation of Thurstone’s funda- 
mental equations [2] will be presented. 

Suppose ability scores are known for two groups of people, Groups | 
and 2, and suppose the ability scores are normally distributed for both groups. 
The relationship between these groups can be described by an expression 
derived as follows: 


(1) (X — M,)/o, = %1, ; 


| 
8 
nw 


(2) (X — M,)/o, 


where X is the ability score, M, , a, , and M, , a, are the ability score means 
and standard deviations, and x, and x, are sigma values of the ability scores 
for Group | and Group 2, respectively. Solving (2) for X and substituting in 


(1) gives 
(3) 2, = (02/0,:)t. + (M, — M,)/o,. 


Thurstone assumes that this relationship can also be expressed in terms 
of measures of item difficulty. The difficulty, p, of each item having been 
determined for Group | and Group 2 separately, each of these p-values can 
be converted to its corresponding normal deviate. The normal deviates 
(xf , xf) can be plotted for Group | versus Group 2 and the relationship 
between these deviates can be expressed, 


(xj — m,)/8, = (23 — Mg)/8 , 
*Acknowledgment is due to Dr. Gulliksen, Dr. Lord, Dr. Swineford, and Dr. Tucker 
for their criticism and advice. 
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from which 
(4) at = (s,/s.)4, + m, — (8:/82)Me , 


where m, , m, and 8, , $2 are the means and standard deviations of the norma] 
deviate values for the two groups. 

Assuming that (3) and (4) are alternative expressions for the same 
relationship Thurstone concludes that the slopes are equal, or 


(5) o2/0, = 8/82 , 
and that the intercepts are equal, or 
(6) (M, aad M,)/o, =m — (s,/S2) Mz . 


It should be pointed out here that the identity of (3) and (4) cannot be 
demonstrated mathematically. Although there is a simple relationship 
between item difficulty and test score mean, there is no corresponding 
relationship between item difficulty and test score standard deviation without 
also taking into account the correlations between item scores and test scores. 
These correlations are not involved in either (3) or (4). Therefore, (5) and 
(6) can represent, at best, only approximations. In situations where the 
correlations can safely be ignored, Thurstone’s assumptions can lead to a 
very convenient scaling method. 

When the groups are similar, this scaling procedure produces useful 
results. When they are different, its value is open to doubt. The remainder 
of this paper will describe examples, using both real and fictitious data, 
that should serve to indicate the extent of the discrepancies that may be 
expected when the groups differ in certain specified respects. Accepting, 
then, Thurstone’s assumptions and an assumption of an identity between 
the test scores and the underlying ability, (5) and (6) may be applied to item 
scaling and score scaling in the following ways: 

For item scaling. To estimate the mean and standard deviation of the 
item difficulties (hereafter “item difficulty” refers to normal deviate value 
only) for Group 2 when the same test is given to Groups 1 and 2 and item- 
analysis data are available for only Group 1, (5) and (6) can be combined 
to give 
(7) M. = [m, — (M, — M.)/0,\(o:/02), 

(8) 8. = 8,(0,/02), 

where #, and §, are the estimated mean and standard deviation of the item 
difficulties for Group 2; m, and s, are the observed mean and standard 
deviation of the item difficulties for Group 1; M, , o, and M, , o, are test 
score means and standard deviations for the two groups. 

For score scaling. To estimate the test score mean and standard deviation 
of test Form 1 for Group 2 (Group | takes only test Form 1 and Group 2 
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takes only test Form 2) when a set of common items is contained in both test 
forms and when item-analysis data are available for both groups, (5) and 
(6) can be employed as follows: 


(9) M, = o,[m, — (8:/s2)m2] + My, , 
(10) G, = 04(8;/82), 


where m, , 8, and m, , 82 are the observed means and standard deviation of 
item difficulties of the items common to the two forms, and M, and é, are the 
estimated mean and standard deviation for Group 2 in terms of the same 
scale used for the Group 1 data [4]. 

An empirical check on the applications of (7) through (10) can be made 
if the same test is given to two groups and if item analyses are obtained 
separately for these groups. There are two practical difficulties, however, in 
making such an empirical check. The first is to locate results for the same 
test given to two groups which are known to be different. The second is to 
avoid the complicating factor of the drop-outs (the items not answered by 
all the examinees toward the end of the test). If the two groups are not very 
different, the discrepancy between the estimated values and the correct 
values will net be very great in any case. If the drop-outs of the two groups 
are at different rates they would also affect the item difficulties and test 
scores differently for the two groups. 

Recently, one form of the Selective Service College Qualification Test, 
which was item-analyzed separately for each of the four college classes, was 
found adequate for such an empirical check. The observed means and standard 
deviations of item difficulties and subtest scores (raw scores) can be sub- 
stituted in (5) through (10) for verifying the assumed relationships for any 
two of the classes. A discrepancy between the observed values and estimated 
values is found for any two of the classes, particularly for freshmen and 
seniors, the two extreme groups. In order to keep this paper short, only one 
representative example will be described for illustration. 

A subtest of the same form, designed to measure data interpretation, 
Items 61-90 (29 items; Item 73 was not scored), has the following statistics 
for freshmen and seniors, all of whom attempted every item (random sample 
of 500 papers for each group): 

Raw score mean and standard deviation: 


Freshmen M, = 20.0340, ao, = 4.2555; 


22.5000, o, = 4.2631. 


Seniors M, 
Item difficulty mean and standard deviation (for item data see Table 1): 
Freshmen m, = —.0810, 8, = .6333; 


Seniors m, = —.8745, s, = .5647. 
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TABLE 1 


Item Statistics for a Set of Twenty-nine Items Administered 


to 500 Freshmen and 500 Seniors in College 

















Proportion Proportion 
of Freshmen of Seniors 
Item Who Answer Who Answer Difficulty Difficulty 
Number Correctly Correctly for Freshmen for Seniors 

Pr Ps Xp s 
Me Adiecabdusasese ee .862 .898 -1.09 -1.27 
a nie skaidha ke aaehe -888 -926 -1.22 -1.45 
DS ax cicws caewes .830 .892 -0.95 -1.24 
Peet osseticscs -794 852 -0.82 -1.05 
APES oe eee .818 .882 -0.91 -1.19 
rn re bis 508 0574 -0.02 -0.19 
S PS ree -910 me) Fan -1.34 -1.59 
[Ss akkwa aceon ee -920 -936 -1.41 -1.52 
ene egcksrerekous .880 -928 -1.17 -1.46 
a wevsesoe boaee -742 -840 -0.65 -0.99 
Th pag haseneeeecves -760 -852 -0.71 -1.05 
Be SSeehosser choses 718 .836 -0.58 -0.98 
iS Gpeueussedcecony 522 .652 -0.06 -0.39 
“PAT Poe See .868 -918 -1.12 -1.39 
ere re .888 -9h2 -1.22 -1.57 
1 CEE PN ete 608 “T16 -0.27 -0.57 
Ne sek wsenusaoose -604 °754 -0.26 * -0.69 
Pe Monsseoseace : 506 -610 -0.02 -0.28 
Me Sc ckeGasuareeene -636 -782 -0.35 -0.78 
a ae .328 518 40.45 -0.05 
1 Ee ees ; -906 -930 -1.32 -1.48 
Me shssgaususkesians -920 942 -1.41 -1.57 
EC Tee ee 786 .852 -0.79 -1.05 
Fi pikéeesusacoe ne -836 .878 ~0.98 -1.17 
BO stsse ee ene -690 -724 -0.50 -0.59 
OF we in sane re «430 -516 +0.18 -0.04 
[eben aieacoanetsan -330 2532 40.44 -0.08 
2 Swenceeaos re -358 -490 +0.36 +0.03 
OD swan rey ee -188 386 +0.89 +0.29 





Correlation between item difficulties for freshmen and seniors: 


Let us see to what extent these observed data satisfy (3), (4), and (7) 


Ty, = .9867. 


through (10). Equation (3) gives 


Equation (4) gives 


Neither the slopes nor the intercepts are equal, as stated by (5) and (6). 
Similarly, for estimating the mean and standard deviation of item 
difficulties for the senior group from the item-analysis data for the freshman 


Ly 


group, and the score data, (7) gives 


m, = [m, — (M, — M,)/o;\(c,/o,) = —1.16, 


1.00x, + .58. 


ry = 1.122; + .40. 
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instead of the observed value, m, = —.87. Equation (8) gives 
§, a 8(a;/0;) _ .63, 
instead of the observed value, s, = .56. 


For estimating the test score mean and standard deviation for the 
senior group from the observed item difficulty statistics, (9) gives 


M, = o,[m, — (s,/s,)m,] + M, = 21.74, 
instead of the observed value, M, = 22.50. Equation (10) gives 
= o;(8;/8,) = 4.77, 


instead of the observed value, ¢, = 4.26. 
The foregoing empirical check on the estimated values and observed 
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Figure 1 
Application of Equations (3) and (4) to real data 


values is also shown in Figure 1. In Figure 1 the dotted line represents (3), 
based on test score data. The solid line represents (4), based on item-difficulty 
data. (Each dot represents an item.) The discrepancy seems too great to be 
ignored. 

Observed data are always subject to sampling error, to which this dis- 
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crepancy might conceivably be ascribed. Let us, therefore, set up a fictitious 
example in order to avoid the possible effects of such errors. 

In order to describe a fictitious example the following formulas are 
needed to express the standard deviation of test scores in terms of item 
statistics: 


(11) M = Dp; , 
(12) o= rz; ,; 


where M is the raw-score mean, p; is the proportion of correct responses on 
item j (p; = R;/N), o is the raw-score standard deviation, r; is the biserial 
correlation for item j, and z; is the ordinate of the unit normal curve at the 
point x; , the item difficulty for item j. 

Equation (11) is obvious. A quick derivation of (12) follows one equation 
which has been proved by Gulliksen [1]. In his Chapter 21, equation (20) is 


(13) dies > 1218; ? 


where r,; is the point biserial correlation between item score and test score, 
and s; is the item standard deviation, which can be written Vp;(1 — p,)- 

The point biserial correlation can be expressed in terms of the biserial 
correlation: 


(14) it = r (2; /s;). 


Substituting (14) in (13) gives (12). It should be noted that no approximations 
were used in the derivations of (11) and (12). Therefore when 7; and z; are 
available for all the items in the test, they can be used to reproduce the 
mean and standard deviation of the raw test scores exactly. 

Suppose that a test given to two groups is composed of 100 items with 
the following statistics. 

There is one set of 50 equivalent items, as follows: 


p, = .30 (x, = .50, z, = .35), r, = .30, for Group 1, 


50 (z, = .00, 2. = 


| 
ie 
= 


r, = .30, for Group 2; 


Pe 

and another set of 50 equivalent items, as follows: 
p, = 69 (7%, = —.50, 2, = .35), r, = .30, for Group 1, 
Pp. = 84 (x, = —1.00, z = .24), r. = .30, for Group 2. 


Although these fictitious data are unrealistic in a sense, none of the 
numerical values appears to be unreasonable. The means and standard 
deviations of the test scores can be computed from (11) and (12): 
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M, = 50, M, a 67, 
o, = 10.5, o. = 9.6. 


The means and standard deviations of the item difficulties (1) can be 
computed directly: 


m = .00, M2, = — .50, 
§ = <0; Ss = 50. 


Now let us apply these fictitious data to (3), (4) and (7) through (10): 


(3) givesxz, = .91 2, + 1.62. 

(4) givesay = 2,’ + .50. 

(7) gives, = 1.77, instead of m, = —.50. 
(8) gives §, =  .55, instead of s, = .50. 


(9) gives 17, = 55.25, instead of M, = 67. 
(10) gives ¢, = 10.5, instead of ¢. = 9.6. 


The discrepancy between the dotted line based on scores and the sclid 
line based on item difficulties is shown graphically in Figure 2, which shows 
that the conversion line based on item difficulties and the conversion line 
based on test scores are not the same line. 
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Figure 2 
Application of Equations (3) and (4) to fictitious data 
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This fictitious example has eliminated the problem of unreliability of 
the data and also avoided the problem of assumptions of normality. The 
finding, consequently, leads us to question the relationship of slopes and 
intercepts of test score conversion and item difficulty conversion explicitly 
stated in (5) and (6), which are the direct consequences of the assumption 
of an identity of test score conversion and item difficulty conversion, (3) 
and (4). Let us investigate further (5) and (6). 

Suppose a test is given to two groups, and either of the following two 
simple cases occurs: 

A. The means are equal but the standard deviations are different. 
B. The means are different but the standard deviations are equal. 
An equating method which is appropriate for the more general case should 

certainly work for these two simple cases. 

In Case A it is assumed that, M, = M,, 0, ¥ o, , and m, = m,, and 
8, ¥ s.. But when M, = M,,m, = m, (6), 


(M, — M,)/o, = m, — (8;/82)m, , 
gives no alternative but 
8,/s& = 1 or 4 = 8. 
And thus (5), 


8,/82 = 02/01 , 


also gives no alternative but 


iio, = 1 GG o, =e . 


The results, therefore, contradict our assumptions. 
In Case B it is assumed that M, ~ M,,o, = o.,andm, # m,,8, = %. 
It is obvious that (5) is consistent with our assumptions. But (6) gives 


(M, ‘a M,)/o,; =m Mm. 


This result implies that when ¢, = o, , 8; = s, , the intercept of the conversion 
line determined by test score means and standard deviations varies with 
the test score standard deviations, whereas the intercept of the conversion 
line determined by item difficulty statistics is independent of the standard 
deviations of item difficulties. This is evidently not true. It is conceivable 
that when oc, and oa, (in this case ¢, = o,) are small and the difference between 
M, and M, is large the conversion line of test scores can be very different 
from the conversion line of item difficulties, as has been shown by the fictitious 
example. 

The foregoing investigation seems sufficient for us to draw the con- 
clusion that (5) and (6) can be approximately accurate only when the two 
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groups are similar but larger error would result when the two groups differ 
substantially. How this kind of error can be systematically corrected is not 
known to the writer. It should be noted, however, that it is not related to 
sampling errors in the observed data [3]. Since we are dealing with fictitious 
data we may assume no error of measurement in the statistics of our fictitious 
example. Since we know that (5) and (6) are the direct consequences of the 
assumption of an identity of (3) and (4), then we also know that when the 
groups are different the fundamental assumption of the identity between 
test score conversion and item difficulty conversion, in such simple relations, 
is not met. 
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A method is presented for converting the scores on one form of a 
test to those on another form of the same test. The method is particularly 
applicable to the case where each form has been administered to a different 
group and the only link between the two forms is a subset of items common 
to both. The proposed method, called the item method of conversion, has 
been applied to several tests for which other methods of conversion are 
available for comparison. The necessary data are limited to tests for which 
the total score is the criterion for item analyses. The method gives highly 
satisfactory results for all the tests to which it has been applied, particularly 
when the two groups are rather different, in which case the delta method 
(a different item method) is inappropriate. 


One of the problems arising in the construction of two (or more) forms 
of the same test is that of converting the scores to a common scale in order 
that the tests can be used interchangeably. In order to effect such a con- 
version, it is necessary to know, or to be able to estimate, the score statistics 
on both forms for the same group of examinees. 

The mos‘ obvious procedure is to administer both forms to an experi- 
mental population. This procedure is sometimes followed, with special pro- 
visions made for taking into account possible practice effect. Procedures 
which involve an estimation of score statistics on one form from observed 
data for the other form, however, have a far wider range of applicability in 
practical testing work. 

There are in current use at Educational Testing Service several methods 
for estimating the means and standard deviations on two forms of a test for 
the same group. Two of these methods, to which reference will be made 
later, are the part-score method of conversion and the delta method of conversion. 
Each has advantages and disadvantages, which cannot be fully presented 
here. The purpose of this paper is to present a new conversion procedure 
which gives promise of a high degree of accuracy. Comparisons have been 
made with the methods named above. The new method will be called the 
item method of conversion. 


*The authors are only two of a group, including W. H. Angoff, F. M. Lord, and 
M. K. Schultz, all of whom have made important contributions to this paper. 
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Development of the Method 


The description of the method will start with the relationship between 
the test score distribution and item statistics for a single group taking a 
single test form. Let 


p; = R;/N, 


where p; is the proportion of correct responses for item /, 
R; is the number of correct responses for item j, 
N is the number of cases in the test score distribution, 


and let m, = (M; — M)/o, 


where m, is the mean standardized test score of those answering item 7 
correctly, 
M;, is mean raw test score of those answering item j correctly, and 
M, o are the mean and standard deviation of the rights scores on the 
test. 
For a test of n items the raw-score mean and standard deviation can be 
expressed in terms of the item statistics as follows: 


(1) M = Qi p;, 
i=l 
(2) ¢= > pm; ‘ 


Equation (1) is obvious and familiar. Equation (2) may need explanation. 
A quick derivation follows from two equations which have been given by 
Gulliksen [2]. In his Chapter 21, equation (20) is 


(3) die » Vj 28; , 
i=1 

and (32) is 

(4) 8; = p(M; — M)/c, 


where, in present notation, 

r;, is the point biserial item-test correlation, 

s; is the item standard deviation, which can be written Vp,(1 — pi) 
<quation (2) can be obtained by substituting (4) in (3). 

It should be noted that no assumptions or approximations were used in 
the derivation of (1) and (2). When the appropriate item statistics are 
available they can be used to reproduce the mean and standard deviation of 
the test scores exactly. 

Thus far the relationship between the test score statistics (M and a) 
and item statistics (p; and m;) has been discussed only for a single group 





n 








FRANCES SWINEFORD AND CHUNG-TEH FAN 187 


taking a single test form. When the same test is given to two groups, two 
sets of item statistics are available: p;, and m;, , which can reproduce M/, 
and o, for Group 1, and p;, and m;, , which can reproduce M, and o, for 
Group 2. 

The relationship between corresponding values of p,;, and pj. is non- 
linear, but a virtually linear relation may be expected between x;, and 2;. , 
where x; is the normal deviate above which p; of the area under the normal 
curve lies. The equation relating the two sets of normal deviates is 


(5) (Xj2 ial M,.,)/cz, = (xj, a M,,)/cz, ’ 


where M, and o, are the mean and standard deviation, respectively, of 2; . 
It should be noted that the linear relation (5) will not hold if the test contains 
two or more kinds of material such that the two groups differ more with 
respect to one kind than with respect to the others, because of differences in 
sex or in special training or experience, for example. 

A similar linear equation may be written as a close approximation for 
relating m;, and mj, : 


(6) (M;2 ca Mn.)/Om: a (mii ie Mn,)/Om, ’ 


where //,, and o,, are the mean and standard deviation of m; . 

Now let this test, for which (5) and (6) have been determined, be a 
set. of common items in the two forms, Form Y and Form Z, taken by Group 
1 and Group 2, respectively, and assume that the equations established 
through the common items can be applied to the rest of the items for de- 
termining 2;, and m,. values from x;, and m;, values. The estimated mean 
and standard deviation on Form Y for Group 2 (for which the Form Z data 
are known) can be computed by formulas (1) and (2) from the estimated 
values of m;, and pj. (transformed from x;,). The conversion equation re- 
lating the two forms can then be written from the observed mean and standard 
deviation on Form Z and the estimated mean and standard deviation on Form 
Y, all computed for the same group. The conversion equation is simply the 
formula derived by setting corresponding standard scores equal, and is written 


Y = (¢,/c0,)Z — (¢,/0,)M,+ M,. 


It should be noted that this method of conversion is based on a minimum 
number of assumptions. The principal assumptions used are: 


1) There is a linear relation between x;, and 2;. . 

2) There is a linear relation between m;,; and m;,. . 

3) The common items have been selected to represent the remaining 
items, and therefore the linear relation of x; and m, between the two groups 
established through common items can be applied to all the items. 

The three foregoing assumptions can generally be fulfilled in practice. 
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The first two assumptions can be checked by their plots and their correlations. 
If the plots show a very high, linear relation, these assumptions have been 
met. The third assumption requires specific attention by the test constructor, 
but this assumption cannot be avoided by any method of conversion which 
uses common items as a link. 

The item method of conversion has been compared with several other 
conversion procedures using real data. The test material in each case is 
homogeneous in nature, so that (5) describes the data with a high degree of 
accuracy. The most stringent check on the method consisted of equating a 
test to itself—a “‘conversion” which would never be required in practice but 
which serves admirably to identify weaknesses in conversion procedures. 
The 50-item test had been administered to two significantly different groups. 
The delta method of conversion, identical in principle with the Thurstone 
method of absolute scaling described by Fan [1], proved unsatisfactory, 
whether twenty or thirty items were treated as common to “both” forms, 
thus supporting the argument advanced by Fan. The part-score method 
was acceptable, whether based on twenty or on thirty items. The item 
method, also tried with twenty and with thirty items, reproduced the actual 
scores with even less error than did the part-score method, although additional 
evidence will be required to establish the superiority of either of these two 
procedures over the other. 

The results from these and other comparisons are promising. (A more 
complete description of the numerical examples can be obtained from the 
authors.) The item method can probably be used successfully with any test 
for which the total score is appropriately used as the criterion for item 
analysis. Applications to tests where part-score criteria are used have not 
yet been attempted. 
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A REVISED LAW OF COMPARATIVE JUDGMENT* | 
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In contrast to Thurstone’s Law of Comparative Judgment, a model 
in which a comparison pair and its complement are assumed to give rise to 
two different distributions of differences is considered. The consequences 
of this revised model on scaling problems is developed. 


This paper presents a more general model of paired comparison scaling 
that follows from, but is less restrictive than, Thurstone’s [7] original state- 
ment of the Law of Comparative Judgment. 

Consider the Law of Comparative Judgment, Case V, as usually written: 


(1) S; = S; = Xi; ’ 


where X,; is the normal deviate corresponding to the proportion of occasions 
in which stimulus 7 is judged greater than j, and S; and S; are the scale 
values or apparent magnitudes of the stimuli. The model was designed specifi- 
cally to provide for the scaling of a single set of stimuli of which 7 and j are two 
members of the set. It is assumed that when 7 is the same as j the effects of 
the stimulus on the judgment behavior, i.e., the scale values S; and S; , are 
equal. Let this be called the identity assumption. In a strict theoretical 
sense, not without important consequences, 7 and 7 can never be alike: by 
definition each stimulus in a pair must maintain separate identity for the 
subject, and therefore the stimuli must be in some sense physically different. 
Typically, they are separated spatially or temporally: one is on the right 
and the other on the left, or one is presented first and the other second. 
Since position in the pair may have an effect on the apparent magnitude of 
the stimulus, it cannot be assumed a priori that the values S; and S; are 
necessarily equal. Time error and position preference are well-known 
phenomena that contradict this identity assumption. 

The identity assumption is implicit in the usual computational pro- 
cedures for paired comparison data, specified originally by Thurstone [7]. 
A kind of symmetry is forced on the matrix of proportions by an averaging 
procedure such that P;,; , the proportion of judgments 7 greater than j, 
equals 1 — P,,,; . The result is a somewhat arbitrary cancelling out of any 
effect of bias due to position in the comparison pair. Also, no provision is 


*The research in this article was supported jointly by the Army, Navy, and Air 
Force under contract with the Massachusetts Institute of Technology. 
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made for using entries in the diagonal of the matrix if these proportions 
differ from .50. 

Briefly, this paper presents a model in which a comparison pair and its 
complement are assumed to give rise to two different distributions of differ- 
ences on the psychological continuum; each distribution of differences over 
trials is assumed to be normal, but the distributions may have different 
means (and, ultimately, even different variances.) If this assumption is 
ever correct, then the original model is technically inaccurate in that the 
average of two proportions each from a different normal distribution is not 
itself a proportion in a normal distribution. However, since in practice the 
assumption of a normal distribution is not crucial, as Mosteller [5] says, “‘is 
more in the nature of a computational device than anything else,” it would 
be a mistake to assume that the usual computational procedures are seriously 
in error in this sense. In another way, averaging a matrix that exhibits bias 
‘an have a more serious consequence: as Mosteller [5] points out in con- 
nection with his test of goodness of fit, the chi square values tend to be 
spuriously low for such data. Thus, in addition to providing an explicit means 
of evaluating bias, the proposed method restores the validity of the statistical 
test in the event of such an effect. 

Since theoretically the identity assumption is never correct, and since 
it is inaccurate in certain applications, there is every advantage in discarding 
this restriction on the model. It is demonstrated below that a simple extension 
of the usual procedures leads to a meaningful paired comparison scale without 
the restrictive assumption. It is shown that the revised model has an im- 
portant consequence for scaling in the area of sensory psychophysics. The 
use of a more efficient experimental design of the paired comparison experiment 
is suggested for attitude measurement. Finally, it is shown that Thurstone’s 
model for successive intervals data follows more directly from the more 
general model of paired comparisons proposed here. 


The Revised Model 


To emphasize that location of the stimulus may affect its apparent value, 
let Case V of the Law of Comparative Judgment be written 


(2) on, a) Oven 


A may represent the apparent magnitudes of the stimuli on the right and B 
on the left. Or A may be the first in order and B the second. To obtain a set 
of scale values, the P;,; are used as obtained, and no attempt is made to 
average P;.; with 1 — P;,; across the diagonal. In some cases diagonal 
entries are not obtained because the comparison of like stimuli is absurd— 
as is usually the case for application of the model in attitude measurement. 
These cells may be treated as unfilled cells in the matrix. No entry need be 
assumed since a procedure will be discussed that permits scaling of incomplete 
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matrices. All empirical values of P;,; are converted to X;; by the normal 
integral transformation in the usual manner. 
The least square solution of (2) for a complete matrix of data is given by 


(3) A= (5 B,/n)+ (Ex. /n) 


and 


» aa (Ea/n)-(Ex0/s) 


This follows as a simple extension of the solution for symmetric matrices 
demonstrated by Mosteller [4]. Since the origin of the scale is indeterminate, 
it is convenient to set the average value of, say, B; equal to zero. Substituting 
zero into (3) gives 


(3a) A; — oe x, /'n). 


The values of A, obtained from (3a) can be substituted in (4) to obtain B; . 
As a check, the sum of B; should be zero. 

For an incomplete matrix, a least square solution of (2) can be obtained 
by the principle proposed by Gulliksen [3]. Find an approximate set of values 
of A; (or B;) by any convenient method. For example, values can be obtained 
by Thurstone’s procedure of finding the mean difference between pairs of 
values in adjacent rows or columns as an estimate of scale separation. Given, 
say, values of A, , find values of B; from (4). The sums apply only to filled 
cells, and the value of m must be adjusted for each row or column. For 
example, in the jth column of a matrix there may be values of X.; , X3; and 
X,; . The other entries are missing, say, because the obtained proportions 
were too small or too large to give stable estimates. The first term in (4) is 
given by the sum (A, + A, + A,)/m, where m = 3, the number of filled 
cells in the column. The second term is the mean of the filled cells. With 
values of B; obtained in this way, a new set of estimates of A; is obtained from 
(3) in an analogous manner—that is, by summing over the filled cells in the 
ith row and adjusting the value of n according to the entries in the row. 
Iterate until the sum of squared errors reaches a stable state for the desired 
decimal accuracy of scale values. 

The chi square test of goodness of fit of the model proposed by Mosteller 
[5] is readily adapted to the scaling of asymmetric matrices. Use of this test 
is illustrated below. 


Application to Some Data on Visual Judgments of Line Lengths 


In order to demonstrate the application of (2), paired comparison data 
were gathered on line-length judgments with the horizontal-vertical illusion 
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as a source of bias. Two thin, black lines were presented as the comparison 
pair in the form of a cross, centered on white cards ten inches on a side. The 
lines varied in seven equal steps of one-sixteenth of an inch from 3 13/16 to 
4 3/16 inches. The viewing distance was about 30 inches. All of the forty-nine 
possible pairs were presented five times to each of ten subjects—except that 
the seven pairs of equal length appeared ten times per subject. 

The matrix of proportions pooled over subjects is shown in Table 1. 
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TABLE 1 
Proportion of Times the Vertical Line Was Judged Greater 
Than the Horizontal and Scale Values* 
HORIZONTAL Scale 
| 1 2 3 4 5 e Values 
4 74 -66 .24 18 of2 - = .66 
i. 4 86 79 48 .36 .26 .06 - 1.33 
S13 - 92 80 54 .30 18 08 1.66 
B& | 4 - 94 .74 82 .68 .26 #8 .08 2.00 
oo] 
Sl 5 - .84 94 .86 as .70 .38 2.49 
6 - - 94 86 94 74 68 3.03 
7 | ~ ~ - ~ - 92 .279 3,82 
Séeale e 2 
Vaines | 04 55 4:49 41.51 4.78 245 2:97 














*Proportions less than .05 and greater than .95 omitted. 


Inspection of the major diagonal shows that the vertical lines were consistently 
judged longer when the two lines were equal in length. Likewise the sums of 
complementary pairs of off-diagonal entries are consistently greater than 
unity. The vertical and horizontal lines were scaled separately by the pro- 
cedure for an incomplete matrix outlined above. The iterative procedure 
began with estimates of B; obtained by the method of differences between 
adjacent columns. The error variance in estimating the empirical values of 
X;; was computed for each of four complete iterations, i.e., for each new set 
of values of both A; and B; . The percentages of error variance were 8.9339, 
8.4221, 8.4169 and 8.4154; thus the scale values of the fourth iteration were 
judged to be adequately precise. 

The scale values are plotted against each other in Figure 1. Each point 
represents the scale values of a pair of lines of equal length. The straight line 
fitted to the data passes through the point representing the mean scale values 
and has unit slope. To the extent that the fit is satisfactory, the result shows 
that the scale values are linearly related and that the bias due to the illusion 
is a constant regardless of scale value. Thus, it seems legitimate in this case 
to find the average scale value for line Jength regardless of position. Such a 
scale obviously would not differ appreciably from that obtained had values 
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FIGurE 1 


Plot of scale values of vertical versus horizontal lines of equal length. The fitted line is of 
unit slope and passes through the mean. 


of P;,; and 1 — P;,; been averaged across the diagonal. However, it is clear 
that the new procedure has a sounder basis because it permits explicit ex- 
amination of the interaction of bias with scale value. It also provides a basis 
for using the biased entries in the main diagonal, a procedure in contrast 
to the traditional one. 

Besides the average scale values for length, a single number represents 
the average bias, the effect of illusion, as shown in Figure 1. It is in the same 
psychological units as the scale of apparent length. This result may be con- 
trasted to the measure of illusion effect obtained by the Method of Limits. 
By this method the Point of Subjective Equality would be obtained, and the 
amount of bias would be expressed in physical units. It is clear that in general 
the interaction between the bias and main variables will be different if 
measured in equivalent physical units rather than in psychological units 
whenever the physical and psychological scales are non-linearly related. 
This problem is not encountered in these data because, as shown in Figure 2, 
there is little cause to reject the hypothesis that actual and apparent lengths 
are linearly related in the narrow region of the experiment. 
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FIGURE 2 
Average apparent length versus actual length of line. Solid line fitted to data points by 
inspection. The dashed lines represent the scales expected when the illusion effect is 
accounted for. 


The model for a constant additive bias may be written explicitly 
(5) S; — S; a. b = X ij ’ 


where S; and S; have the same numerical value when ¢ = j and b is the bias 
effect. Although a least square solution to (5) may be obtained, it is somewhat 
tedious; it is likely that average values of correponding values of A; and B; 
will serve in most practical applications. For the illusion data, initial values 
of S and b, estimated from weighted averages of A; and B; , were only 
trivially different from least square parameters. 

For illustrative purposes, two chi square tests of goodness of fit were 
carried out, one for separate and one for combined scale values. It will be 
remembered that Mosteller’s [5] formula is 


do (6; — 645)" 


x = ~(821I/N’ 
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where 6/; and 6%; , for empirical and computed proportions, respectively, 


are given by 
6;; = aresin VP; ° 


N is the number of judgments per stimulus. Since the NV was 100 for diagonal 
entries and 50 for off-diagonal entries, the sums of chi squares were computed 
separately for the two sections of the matrix and added together. The number 
of degrees of freedom for separate scales for vertical and horizontal is 23— 
36 empirical values are estimated from 13 scale differences, i.e., 14 scale 
values minus one parameter for the arbitrary origin. For the constant bias 
model there are seven parameters, six scale differences plus one constant of 
bias, and therefore 29 (=36 — 7) degrees of freedom. Both tests were highly 
significant (p < .001)—probably due to heterogeneity of the results over 
individuals. 

The analysis of these data indicates that for scaling attributes of sensation 
the method has more significance than merely handling an annoying and 
trivial kind of bias. Typically the psychological magnitude of a stimulus is 
related principally to one physical property of the stimulus. For example, 
loudness is chiefly affected by the intensity of a tone. However, the apparent 
magnitude may also be affected to a small but significant extent by other 
physical parameters, e.g., loudness is also a function of frequency. It is obvious 
that the variable that permits identification of each member of a pair of 
stimuli, i.e., spatial location or temporal order, may itself have an effect of 
importance. This more general interpretation of Thurstone’s principle permits 
direct measurement of the effect of such a secondary parameter on the same 
psychological scale as that for the primary variable. 


Implications for Attitude Measurement 


It is apparent that for scaling stimulus objects that cannot be ordered 
along a physical continuum, in the study of attitudes, aesthetic preference, 
and the like, the scaling of bias is of secondary importance. Although the 
procedure outlined above should be carried out routinely on any paired 
comparison data, it is unlikely that position in the pair will often be a signifi- 
cant variable. Clearly, if the two scales are related by a line of unit slope, 
an average scale is meaningful. This is true even if there is a constant additive 
bias. There is, however, an important practical consequence of a general 
fact that has not yet been made apparent. Equation (2) and the procedures 
that follow apply even if the two sets of stimuli have no members in common 
whatever. That is, for example, “apple” or “orange” or “pear” may appear 
on the left, each paired with “banana” or “grapes” or “tangerine’”’ on the 
right. If the procedure for solving (2) is carried through, the value of each 
of the six fruits can be determined on a single scale of preference. Indeed, 
the number of stimuli in each location need not even be the same. Accordingly, 
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(3) and (4) were written for different numbers of stimuli, m and n. If there is a 
satisfactory fit of this unidimensional model of stimulus effect, then the 
conclusion that the scale measures a single attribute of the stimulus is 
warranted—whatever the sources of variance. Of course, in this extreme 
case the use of mutually exclusive classes of stimuli in each location permits 
no evaluation of the effects of location apart from the principle effect of the 
stimulus. It may be satisfactory, however, to include only a few probe 
stimuli, common to both locations in the pair and scattered over the range of 
all stimuli on the continuum. The results for the common stimuli would 
indicate the extent of bias. Such a procedure, taken in conjunction with the 
fact that least square solutions for incomplete matrices are possible, leads 
to the practical consideration that the number of stimuli can be greatly 
enlarged at little expense by deliberately omitting many paired comparisons. 


Implications for the Method of Successive Intervals 


Recently there has been a revival of interest in a paired comparison 
model of successive intervals data that Saffir [6] attributes as due to Thurstone. 
Briefly, the scaling procedure follows from the assumption that the boundaries 
between the successive intervals form a hypothetical, ordered set of stimulus 
effects on the psychological continuum. Each judgment of a stimulus ¢ 
represents, in effect, that it lies above some interval boundary g and below 
g + 1. From the (reverse) cumulated frequency of judgments of a stimulus 
over the set of categories, the proportion of times 7 lies above g may be 
obtained. This is interpreted according to Thurstone’s principle by converting 
the proportion to a distance between stimulus and category boundary. For 
example, Case V is written 


(8) S; = es ae X ig ’ 


in which S; is the apparent magnitude of the stimulus, 7’, is the scale value 
of a boundary between intervals, and X,, is the unit normal deviate corre- 
sponding to the proportion of times stimulus ¢ is judged greater than boundary 
g. The application of this model is discussed briefly by Green [2] and will be 
dealt with extensively in a forthcoming monograph by W. S. Torgerson. 

It is clear that the members of the stimulus pair in (8) are very different 
indeed. S; is an effect due to a stimulus object presented to the subject. 
However, 7’, is in the extreme an entirely hypothetical construct, such as in 
the case of absolute, numerical judgments of the magnitude of single stimuli. 
In that case the boundary is dealt with as a stimulus effect expected on the 
basis of instructions to the subject and confirmed by the fit of the model. 
It is clear that application of (8) to successive intervals data involves two 
assumptions: first, the Law of Comparative Judgment is meaningful without 
the restriction to stimulus pairs drawn from a common set, and, second, the 
boundaries between successive intervals may be interpreted as stimuli. The 
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first assumption is that made in the interpretation of (2) for paired stimulus 
objects. Thus it is clear that the successive intervals model follows directly 
from the new interpretation of Thurstone’s principle and not from the model 
implied by the usual computational procedures. Of course, both assumptions 
underlying the successive intervals model suggest difficulties of interpreting 
the significance of the scale values obtained. These problems are dealt with 
elsewhere. However, it is worth commenting that Edwards [1] in a study of 
food preference found close agreement between scale values for stimuli by 
paired comparison and successive intervals methods; presumably his treatment 
of each set of data differs only in unimportant details from that recommended 
here. 

As far as computational procedures are concerned, the methods of 
determining scale values are the same for (8) as for (2). At this point, it 
should be noted that when empirical values of X are present only about 
the diagonal, often the case for successive intervals data, it is the writer’s 
experience that the iterative procedure may converge slowly to the least 
square scale values. One way to understand this circumstance is to realize 
that two distant stimuli are tied together solely by their distances from 
intermediate stimuli and not by any direct estimates. Another way to look at 
the matter is to note that the more holes in the matrix, the more weight is 
carried, in effect, by old estimates of scale values in estimating new values. 
Of course, the better the estimates of the proportions and the fit of the model, 
the fewer iterations will be required. 


The Complete Form of the Model 


The statement of the complete model is 
A; — B; = Xi; Vai + bj — 2r;;a;b; , 


in which the term under the radical allows for the variance associated with 
the difference between scale values. Different discriminal dispersions, a; 
and b; , are assigned to different members of the stimulus pair in a manner 
analogous to the assumption for scale values. 

For Case V, it may be assumed that the correlation is nearly constant 
and, following Mosteller [4], not necessarily zero. It is also reasonable to let 
the variance associated with a stimulus in each position of the pair be constant. 
Then, application of (2) implies only that 


a? + b? — 2rab = 1. 


Specifically it is not necessary to assume that Case V is restricted by the 
assumption that the dispersion of stimuli in one position of the pair is 
necessarily equal to the dispersion of those in the other. As pointed out by 
Torgerson [8], this statement has important consequences for the use of the 
model to fit successive intervals data in which the members of the pair are 
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so radically different. For example, in scaling sensory attributes it is often 
likely in practice that the variance associated with the stimulus is small 
relative to that of the category boundaries. 

It is not essential here to carry the discussion beyond considerations of 
Case V, which is adequate for many applications. Other assumptions may 
be made and their implications written into an equation as dictated by the 
conditions and the results of specific experiments. 
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A FAST APPROXIMATE ALGEBRAIC FACTOR ROTATION 
METHOD TO MAXIMIZE AGREEMENT BETWEEN LOADINGS 
AND PREDETERMINED WEIGHTS 


Davin A. RopGEers 


UNIVERSITY OF CALIFORNIA 


A method of rotating a set of orthogonal axes into a reference frame 
on which loadings are as nearly proportional to a predetermined set of 
weights as possible is presented. The method, an approximate algebraic 
solution, often requires some additional graphical refinement but eliminates 
most of the rotations involved in usual graphical solutions. Its primary 
value is speed and ease of calculation, involving only one matrix multipli- 
cation and solution of a simple formula to determine the rotation cosines. 


The analytic method of factor rotation presented here is computationally 
fast. It provides a useful supplement to graphic procedures when it is desired 
to rotate orthogonal reference axes, e.g., a centroid solution, into a reference 
frame on which loadings correspond maximally to a predetermined hypo- 
thetical set of weights. The hypothetical weights might correspond to pre- 
sumed simple structure loadings or to graded values along some variable 
that might conceivably be present in the factor space. It was originally 
used to determine whether there was a dimension in the factor space, de- 
termined by the intercorrelations of salesmen’s self-concept Q-sorts, that 
would rank the salesmen in the same way that they were known to be ranked 
by their supervisor on the basis of selling ability. 

The problem is to maximize the correlation between a predetermined 
set of weights and the desired factor loadings. Mosier [2] and Tucker [5] 
have offered exact solutions for this problem, applying them to the determi- 
nation of simple structure. When computational labor is not a problem, or 
when considerable investment has been made in determining the hypothetical 
weights to be matched, as in Eysenck’s “criterion analysis” [1], Tucker’s or 
Mosier’s solution is to be preferred. However, these solutions involve lengthy 
computational procedures, including calculation of an inverse matrix, that 
may seem unwarranted if the weights to be matched are only rough ap- 
proximations to presumed actual factor structure—as in the example of 
ranking of salesmen, and/or if only one or two reference axes are sought 
instead of a complete reference frame. 

A less precise but much faster solution can be achieved by maximizing 
the sum of cross products of the arbitrary weights, corrected to a zero mean, 
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with the obtained factor loadings, thereby maximizing the xy cross product 
sum in the standard product moment correlation formula, 


(1) r= )) 2y/No.0, 


in which z is the arbitrary set of weights and y the desired loadings. N and 
o, are fixed quantities determined by the weights chosen and the number 
of tests involved; c, is the only uncontrolled variable in such an approximation. 
The approximation normally yields loadings with greater dispersion than 
would be obtained by an exact solution. The results are usually close enough 
to the desired solution that only slight additional adjustment by graphical 


methods is necessary. 


Derivation of Formulas 


Let the cosines between orthogonal factors I, II, --- , N and the desired 
factor F be z, y,---, u, respectively; let the arbitrary weights be W, , W., ---, 
W,; , so chosen that they sum to zero; let the desired sum of cross products 
between the factor loadings on F and the arbitrary weights be 7’; let the sum 
of the cross products of the arbitrary weights with the corresponding loadings 
on Factor I be a; similarly let the sum of the weights with the factor loadings 





on the other orthogonal factors II, --- , N be b, --- , n, respectively. 
Then 

(2) T =ax+ by +--- +m, 

and 

3) @t+ty te twelaee2tyt+-e- +(l-xr-y—-->), 

so that 

(4) T=ar+by+---+nl—-—2-y-—-:--)”. 

T will be maximum when 

(5) em pt = a =O 

Therefore, for 7’ equal to a maximum, 

(6) Ma a-n(l—2—y— +) aa = , 


and 





(7) oy i Uu 
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Therefore 





(8) ae ee ee ae 
But from (3) 











Therefore 

(9) K=@4+0?+-:- +7’. 
Since 

(10) a'/z* = K, 

then 

(11a) e=4+Ve7/e+0?+--- +7), 
(11b) y=4V0/a+0 +--+ +7), 
and 

(11c) u=tVn7/(e+0+--- +7). 

Since a, b, --- , n can be determined, and since x, y, --- , u are the 
desired rotation cosines, the desired factor can be determined algebraically 
by means of the above formulas. The signs chosen for x, y, --- , u are the 
algebraic signs of the values obtained for a, b, --- , n, respectively. 

In matrix terms, if W is the matrix of arbitrary weights and F, the 
orthogonal factor matrix, then a, b, --- , » are the elements in the F column 


(F being the desired reference axis) of the F;W matrix. The rotation cosines 
are calculated by formula (11) from these column entries. 


Sample Calculation 


An example may clarify the process further. Table 1 summarizes the 
factor loadings on four orthogonal centroid factors, removed from a matrix 
of intercorrelations of Q-sort descriptions of twelve salesmen. On the basis 
of company ratings, the salesmen were placed in pairs in six categories, from 
most competent to least competent, as shown in Table 1. It was desired to 
determine by algebraic rotation the one factor that would best discriminate 
the subjects according to their competence as salesmen. 

As shown in Table 1, arbitrary weights of +5, +3, +1, —1, —3, and 
—5 are assigned on the basis of the salesmen’s ratings, +5 representing the 
highest rating and —5 the lowest one. The rotation cosines are computed 


as follows: 
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a=5X 69+5 X 644+ 3 X 72+ --- + (—5) XK .68 = +1.28 
b=5X 484+5xX 39+ 3 & (—.06) + --- 4+ (15) 
X (—.28) = +6.14. 
e=5X 09+ 5X .244+3 & (-—.17) +--+ + (—d) 
xX .61 = —1.97. 
d= 5 X (—.04) +5 X (-.11) +3 X 424 --- + (—5) 
X (—.15) = +2.39. 
Therefore 
re i. >? Vp ag? = 9.033484 x = +0.183. 
(1.28)7 + (6.14)? + (—1.97)” + (2.39) 
Similarly, y = +0.878, 2 = —0.282, and w = +0.342. 
TABLE 1 
Company Ratings and Factor Data for Twelve Salesmen 
Factor Loadings 
Tucker's 
Proficiency Arbitrary Calculated 
Person Centroid Factors Exact 
Rating Weight Factor 
Rotation 
I a. Ue F 
a 6 +5 +.69 +.43 +.09 -.04 +.46 +.22 
2 6 +5 +.64 +.39 +.24 -.11 +.35 +.12 
5 +3 +.72 -.06 -.17 +.h2 +.27 +.20 
4 5 +3 +.74 +.21 -.23 +.2h +.47 +.30 
5 4 +1 +.79 -.14 +.22 -.04 -.05 -.19 
6 4 +1 +.73 --45 -.07 +.15 -.19 --20 
7 3 -1 +.51 +.16 -.41l -.25 +.26 +.21 
8 3 “1 +.54 -.28 -.27 -.29 -.17 -.22 
9 2 -3 +.54 +.17 +.38 +.36 +.26 +.15 
10 2 -3 +.60 +.26 -.06 -.18 +.29 +.10 
az. Z -5 +.68 -.39 -.16 -.06 -.19 -.22 
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TABLE 2 


Maximized Cross-Product Rotation Applied to 
Thurstone's Centroid Matrix of Fictitious Test Battery 











Cross-Product Rotation 


Actual Simple Structure 








Test 

A 6h hUtylCO A he KR 
1 - @-8 =e @io 3 £& 0. A 
2 -.11 .80 -.21 -.06 -.19 -0 9 0 0 0 
3 26031-2600 «026-030 | «5 5 0 § 0 
4 236 «= 657-22 2=.20 0-228 05 sT 20 0 0 
5 -.27 -.02 .O1 -.18 .48 2 2 ae oa ae 
6 a2 413 «489° 9) -- 26 0 4 6 4 0 
7 -% <M 8 +08 -Bi 0 0 8 6 & 
8 “15 .27 .52 -.21 -.27 0 5 oT 0 0 
9 as M Metso © £ £ 
10 =09) =533 13 «38 .20 0 0 A 6 5 
11 536 -=325 «392 07 -=<3k -6 fe) 05 ott -0 

7 

.0 











oo Ww 

















204 PSYCHOMETRIKA 


Using 2, y, z, and w as the rotation cosines, the loadings on factor F 
are obtained in the usual manner. For example, person one’s loading on 
factor F = .183 X .69 +.878 X .43 + (—.282) X .09 +.342 X (—.04) = 
+.46. The computed loadings are shown in Table 1. The last column in 
Table 1 shows the results of Tucker’s solution [5] for maximizing the corre- 
lation with the arbitrary weights. 

An example of the accuracy of the method is given in Table 2. To 
illustrate his method of extended vectors, Thurstone presented a fictitious 
simple configuration that was then rotated into a centroid orthogonal matrix 
((3], pp. 208-209, and [4], pp. 230-231). Rerotation to the original simple struc- 
ture therefore offers an exact test of rotational methods. Table 2 presents the 
results of the rotation method described here, in which the original simple 
structure loadings were used to determine the arbitrary weights. Although 
approximate, the solution clearly reveals the underlying simple structure 
when the data are plotted, making additional adjustment by graphical 
methods simple and straightforward. Figure 1 shows two sample plots. It 





A c. 
19 7 
12 8 
16 24 ined 9 
15i4 4 6 
15 i 
3 
18 * 10 
25 
B, ~ D, 
22 > 
2 19 
21 8 “a 
7,2 6 14 24184 32 314 
Sit \ 22 20 
1920 23 23 








Figure 1 
Maximized Cross-Product Factor Plots 


should perhaps be emphasized that the purpose of this illustration is to 
show the relative ability of the method to duplicate desired arbitrary di- 
mensions known to be present in the factor space and that the use of a simple 
structure configuration is regarded as only incidental to this purpose. 
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